Modern Birkhauser Classics 


Many of the original research and survey monographs, as well as textbooks, 
in pure and applied mathematics published by Birkhauser in recent 
decades have been groundbreaking and have come to be regarded 
as foundational to the subject. Through the MBC Series, a select number 
of these modern classics, entirely uncorrected, are being re-released in 
paperback (and as eBooks) to ensure that these treasures remain 
accessible to new generations of students, scholars, and researchers. 


Advanced Calculus 


A Differential Forms Approach 


Harold M. Edwards 


Reprint of the 1994 Edition 
Ù Birkhauser 


Harold M. Edwards 
Courant Institute 


New York University 

New York, NY, USA 

ISSN 2197-1803 ISSN 2197-1811 (electronic) 
ISBN 978-0-8 176-841 1-2 ISBN 978-0-8176-8412-9 (eBook) 


DOI 10.1007/978-0-8 176-8412-9 
Springer New York Heidelberg Dordrecht London 


Library of Congress Control Number: 2013953495 


© Harold M. Edwards 2014 

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is 
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on 
microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, 
computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal 
reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the 
purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. 
Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the 
Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for 
use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the 
respective Copyright Law. 

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, 
even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations 
and therefore free for general use. 

While the advice and information in this book are believed to be true and accurate at the date of publication, neither the 
authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. 
The publisher makes no warranty, express or implied, with respect to the material contained herein. 


Printed on acid-free paper 


Springer is part of Springer Science+Business Media (www. birkhauser-science.com) 


Harold M. Edwards 


Advanced Calculus 
A Differential Forms Approach 


Birkhauser 
Boston - Basel - Berlin 


Harold M. Edwards 
Courant Institute 
New York University 
New York, NY 10012 


Library of Congress Cataloging In-Publication Data 


Edwards, Harold M. 
Advanced calculus : a differential forms approach / Harold M. 
Edwards. -- [3rd ed.] 
p. cm. 
Includes index. 
ISBN 0-8176-3707-9 (alk. paper) 
1. Calculus. I. Title. 


QA303.E24 1993 93-20657 

515--dc20 CIP 
Printed on acid-free paper 
© 1994 Harold M. Edwards ® 
Reprinted 1994 with corrections from the Birkhauser ip 


original Houghton Mifflin edition. 


Copyright is not claimed for works of U.S. Government employees. 

All rights reserved. No part of this publication may be reproduced, stored in a retrieval 
system, or transmitted, in any form or by any means, electronic, mechanical, photo- 
copying, recording, or otherwise, without prior permission of the copyright owner. 


Permission to photocopy for internal or personal use of specific clients is granted by 
Birkhauser Boston for libraries and other users registered with the Copyright Clearance 
Center (CCC), provided that the base fee of $6.00 per copy, plus $0.20 per page is paid 
directly to CCC, 21 Congress Street, Salem, MA 01970, U.S.A. Special requests should 
be addressed directly to Birkhauser Boston, 675 Massachusetts Avenue, Cambridge, MA 
02139, U.S.A. 


ISBN 0-8176-3707-9 

ISBN 3-7643-3707-9 

Printed and bound by Quinn-Woodbine, Woodbine, NJ 
Printed in the USA 


98765 


To my students—and especially to those who never 
stopped asking questions. 


Preface to the 1994 Edition 


My first book had a perilous childhood. With this new 
edition, I hope it has reached a secure middle age. 

The book was born in 1969 as an “innovative text- 
book”—a breed everyone claims to want but which usu- 
ally goes straight to the orphanage. My original plan had 
been to write a small supplementary textbook on differen- 
tial forms, but overly optimistic publishers talked me out 
of this modest intention and into the wholly unrealistic ob- 
jective (especially unrealistic for an unknown 30-year-old 
author) of writing a full-scale advanced calculus course that 
would revolutionize the way advanced calculus was taught 
and sell lots of books in the process. 

I have never regretted the effort that I expended in 
the pursuit of this hopeless dream—only that the book was 
published as a textbook and marketed as a textbook, with 
the result that the case for differential forms that it tried 
to make was hardly heard. It received a favorable tele- 
graphic review of a few lines in the American Mathematical 
Monthly, and that was it. The only other way a potential 
reader could learn of the book’s existence was to read an 
advertisement or to encounter one of the publisher’s sales- 
men. Ironically, my subsequent books—Riemann’s Zeta 
Function, Fermat's Last Theorem and Galois Theory—sold 
many more copies than the original edition of Advanced 
Calculus, even though they were written with no commer- 
cial motive at all and were directed to a narrower group of 
readers. 

When the original publisher gave up on the book, it 
was republished, with corrections, by the Krieger Publish- 
ing Company. This edition enjoyed modest but steady sales 
for over a decade. With that edition exhausted and with 
Krieger having decided not to do a new printing, I am enor- 
mously gratified by Birkhduser Boston’s decision to add this 
title to their fine list, to restore it to its original, easy-to- 
read size, and to direct it to an appropriate audience. It is 
at their suggestion that the subtitle “A Differential Forms 
Approach” has been added. 

I wrote the book because I believed that differential 
forms provided the most natural and enlightening approach 


x 


Preface 


to the calculus of several variables. With the exception of 
Chapter 9, which is a bow to the topics in the calculus 
of one variable that are traditionally covered in advanced 
calculus courses, the book is permeated with the use of 
differential forms. 

Colleagues have sometimes expressed the opinion that 
the book is too difficult for the average student of advanced 
calculus, and is suited only to honors students. I disagree. 
I believe these colleagues think the book is difficult be- 
Cause it requires that they, as teachers, rethink the material 
and accustom themselves to a new point of view. For stu- 
dents, who have no prejudices to overcome, I can see no 
way in which the book is more difficult than others. On 
the contrary, my intention was to create a course in which 
the students would learn some useful methods that would 
stand them in good stead, even if the subtleties of uniform 
convergence or the rigorous definitions of surface integrals 
in 3-space eluded them. Differential forms are extremely 
useful and calculation with them is easy. In linear alge- 
bra, in implicit differentiation, in applying the method of 
Lagrange multipliers, and above all in applying the gen- 
eralized Stokes theorem f,.w = J. dw (also known as 
the fundamental theorem of calculus) the use of differential 
forms provides the student with a tool of undeniable useful- 
ness. To learn it requires a fraction of the work needed to 
learn the notation of div, grad, and curl that is often taught, 
and it applies in any number of dimensions, whereas div, 
grad, and curl apply only in three dimensions. 

Admittedly, the book contains far too much material 
for a one year course, and if a teacher feels obliged to cover 
everything, this book will be seen as too hard. Some topics, 
like the derivation of the famous equation E = mc? or the 
rigorous development of the theory of Lebesgue integration 
as a limiting case of Riemann integration, were included 
because I felt I had something to say about them which 
would be of interest to a serious student or to an honors 
class that wanted to attack them. Teachers and students 
alike should regard them as extras, not requirements. 

My thanks to Professor Creighton Buck for allowing 
us to reuse his kind introduction to the 1980 edition, to 
Sheldon Axler for his very flattering review of the book 
in the American Mathematical Monthly of December 1982, 
and to Birkhauser for producing this third edition. 


Harold Edwards 
New York 1993 _ 


Introduction 


It is always exciting to teach the first quarter (or first semester) 
of a calculus course—especially to students who have not become 
blasé from previous exposure in high school. The sheer power of 
the new tool, supported by the philosophical impact of Newton's 
vision, and spiced by the magic of notational abracadabra, carries 
the course for you. The tedium begins when one must supply all 
the details of rigor and technique. At this point, one becomes 
envious of the physicist who frosts his elementary course by refer- 
ences to quarks, gluons and black holes, thereby giving his students 
the illusion of contact with the frontiers of research in physics. 


In mathematics, it is much harder to bring recent research into an intro- 
ductory course. This is the achievement of the author of the text before 
you. He has taken one of the jewels of modern mathematics—the theory 
of differential forms—and made this far reaching generalization of the 
fundamental theorem of calculus the basis for a second course in calculus. 
Moreover, he has made it pedagogically accessible by basing his approach 
on physical intuition and applications. Nor are conventional topics 
omitted, as with several texts that have attempted the same task. (A quick 
glance at the Index will make this more evident than the Table of 
Contents.) 


Of course this is an unorthodox text! However, it is a far more honest 
attempt to present the essence of modem calculus than many texts that 
emphasize mathematical abstraction and the formalism of rigor and logic. 
It starts from the calculus of Leibniz and the Bernouillis, and moves 
smoothly to that of Cartan. 


This is an exciting and challenging text for students (and a teacher) who 
are willing to follow Frost's advice and "take the road less travelled by." 


R. Creighton Buck 
January 1980 


Preface 


There is a widespread misconception that math books 
must be read from beginning to end and that no chapter 
can be read until the preceding chapter has been thor- 
oughly understood. This book is not meant to be read 
in such a constricted way. On the contrary, I would like 
to encourage as much browsing, skipping, and back- 
tracking as possible. For this reason I have included a 
synopsis, I have tried to keep the cross-references to a 
minimum, and I have avoided highly specialized notation 
and terminology. Of course the various subjects covered 
are closely interrelated, and a full appreciation of one 
section often depends on an understanding of some other 
section. Nonetheless, I would hope that any chapter of 
the book could be read with some understanding and 
profit independently of the others. If you come to a state- 
ment which you don’t understand in the middle of a 
passage which makes relatively good sense, I would urge 
you to push right on. The point should clarify itself in due 
time, and, in any case, it is best to read the whole section 
first before trying to fill in the details. That is the most 
important thing I have to say in this preface. The rest of 
what I have to say is said, as clearly as I could say it, in 
the book itself. If you learn anywhere near as much from 
reading it as I have learned from writing it, then we will 
both be very pleased. 


New York 1969 


Contents 


Chapter 1 Constant Forms 


1.1 One-Forms l 
1.2 Two-Forms 5 
1.3 The Evaluation of Two-Forms, Pullbacks 8 
1.4 Three-Forms 15 
1.5 Summary 19 


Chapter 2 Integrals 


2.1 Non-Constant Forms 22 
2.2 Integration 24 
2.3 Definition of Certain Simple Integrals. 

Convergence and the Cauchy Criterion 29 
2.4 Integrals and Pullbacks 38 
2.5 Independence of Parameter 44 
2.6 Summary. Basic Properties of Integrals 49 


Chapter3 Integration and Differentiation 


3.1 The Fundamental Theorem of Calculus 52 
3.2 The Fundamental Theorem in Two Dimensions 58 
3.3 The Fundamental Theorem in Three Dimensions 65 
3.4 Summary. Stokes’ Theorem 72 


Chapter4 Linear Algebra 


4.1 Introduction 76 
4.2 Constant k-Forms on n-Space 86 


Contents 


Xiv 


4.3 
4.4 
4.5 
4.6 


Chapter 5 
5.1 


5.2 
5.3 
5.4 
5.5 


Chapter 6 


6.1 
6.2 
6.3 


6.4 
6.5 
6.6 


Chapter 7 


7.1 
7.2 
7.3 
7.4 
7.5 


Chapter 8 


8.1 
8.2 
8.3 
8.4 
8.5 


Matrix Notation. Jacobians 

The Implicit Function Theorem for Affine Maps 
Abstract Vector Spaces 

Summary. Affine Manifolds 


Differential Calculus 


The Implicit Function Theorem for 
Differentiable Maps 


k-Forms on n-Space. Differentiable Maps 
Proofs 

Application: Lagrange Multipliers 
Summary. Differentiable Manifolds 


Integral Calculus 


Summary 
k-Dimensional Volume 


Independence of Parameter and the 
Definition of fs w 


Manifolds-with-Boundary and Stokes’ Theorem 
General Properties of Integrals 
Integrals as Functions of $ 


Practical Methods of Solution 


Successive Approximation 

Solution of Linear Equations 

Newton’s Method 

Solution of Ordinary Differential Equations 
Three Global Problems 


Applications 


Vector Calculus 

Elementary Differential Equations 

Harmonic Functions and Conformal Coordinates 
Functions of a Complex Variable 

Integrability Conditions 


94 
105 
113 
127 


132 
142 
151 
160 
190 


196 
197 


200 
214 
219 
224 


226 
235 
242 
245 
256 


265 
270 
278 
289 
313 


Contents XV 


8.6 
8.7 
8.8 


Chapter 9 


9.1 
9.2 
9.3 
9.4 
9.5 
9.6 
9.7 
9.8 


Introduction to Homology Theory 
Flows 
Applications to Mathematical Physics 


Further Study of Limits 


The Real Number System 

Real Functions of Real Variables 
Uniform Continuity and Differentiability 
Compactness 

Other Types of Limits 

Interchange of Limits 

Lebesgue Integration 

Banach Spaces 


Appendices 
Answers to Exercises 


Index 


320 
328 
333 


357 
381 
387 
392 
399 
407 
426 
447 


456 


468 


504 


Synopsis 


There are four major topics covered in this book: con- 
vergence, the algebra of forms, the implicit function 
theorem, and the fundamental theorem of calculus. 

Of the four, convergence is the most important, as well 
as the most difficult. It is first considered in §2.3 where it 
arises in connection with the definition of definite integrals 
as limits of sums. Here, and throughout the book, con- 
vergence is defined in terms of the Cauchy Criterion 
(see Appendix 1). The convergence of definite integrals is 
considered again in §6.2 and §6.3. In Chapter 7 the idea 
of convergence occurs in connection with processes of 
successive approximation; this is a particularly simple 
type of convergence and Chapter 7 is a good introduction 
to the general idea of convergence. Of course any limit 
involves convergence, so in this sense the idea of con- 
vergence is also encountered in connection with the defini- 
tion of partial derivatives (§2.4 and §5.2); in these 
sections, however, stress is not laid on the limit concept 
per se. It is in Chapter 9 that the notions of limit and con- 
vergence are treated in earnest. This entire chapter is de- 
voted to these subjects, beginning with real numbers and 
proceeding to more subtle topics such as uniform con- 
tinuity, interchange of limits, and Lebesgue integration. 

The algebra of forms is the most elementary of the four 
topics listed above, but it is the one with which the reader 
is least likely to have some previous acquaintance. For 
this reason Chapter 1 is devoted to an introduction of the 
notation, the elementary operations, and, most impor- 
tant, the motivating ideas of the algebra of forms. This 
introductory chapter should be covered as quickly as 
possible; all the important ideas it contains are repeated 
in more detail later in the book. In Chapter 2 the algebra 
of forms is extended to non-constant forms (§2.1, §2.4). 
In Chapter 4 the algebra of constant forms in considered 


Synopsis 


xviii 


again, from the beginning, defining terms and avoiding 
appeals to geometrical intuition in proofs. In the same 
way Chapter 5 develops the algebra of (non-constant) 
forms from the beginning. Finally, it is shown in Chap- 
ter 6 (especially §6.2) that the algebra of forms corre- 
sponds exactly to the geometrical ideas which originally 
motivated it in Chapter 1. Several important applications 
of the algebra of forms are given; these include the theory 
of determinants and Cramer’s rule (§4.3, §4.4), the theory 
of maxima and minima with the method of Lagrange 
multipliers (§5.4), and integrability conditions for differ- 
ential equations (§8.6). 

The third of the topics listed above, the implicit func- 
tion theorem, is a topic whose importance is too fre- 
quently overlooked in calculus courses. Not only is it the 
theorem on which the use of calculus to find maxima and 
minima is based (§5.4), but it is also the essential in- 
gredient in the definition of surface integrals. More 
generally, the implicit function theorem is essential to the 
definition of any definite integral in which the domain of 
integration is a k-dimensional manifold contained in a 
space of more than k dimensions (see §2.4, §2.5, §6.3, 
§6.4, §6.5). The implicit function theorem is first stated 
(§4.1) for affine functions, in which case it is little more 
than the solution of m equations in n unknowns by the 
techniques of high school algebra. The general (non- 
affine) theorem is almost as simple to state and apply 
(§5.1), but it is considerably more difficult to prove. The 
proof, which is by the method of successive approxima- 
tions, is given in §7.1. Other more practical methods of 
solving m equations in n unknowns are discussed later in 
Chapter 7, including practical methods of solving affine 
equations (§7.2). 

The last of the four topics, the fundamental theorem of 
calculus, is the subject of Chapter 3. Included under the 
heading of the “fundamental theorem” is its generaliza- 
tion to higher dimensions 


Ís dw = hs (63) 


which is known as Stokes’ theorem. The complete state- 
ment and proof of Stokes’ theorem (§6.5) requires most 
of the theory of the first six chapters and can be regarded 
as one of the primary motivations for this theory. 

In broad outline, the first three chapters are almost 
entirely introductory. The next three chapters are the core 


Synopsis 


xix 


of the calculus of several variables, covering linear alge- 
bra, differential calculus, and integral calculus in that 
order. For the most part Chapters 4-6 do not rely on 
Chapters 1-3 except to provide motivation for the ab- 
stract theory. Chapter 7 is almost entirely independent 
of the other chapters and can be read either before or 
after them. Chapter 8 is an assortment of applications; 
most of these applications can be understood on the 
basis of the informal introduction of Chapters 1-3 and 
do not require the more rigorous abstract theory of 
Chapters 4-6. Finally, Chapter 9 is almost entirely inde- 
pendent of the others. Only a small amount of adjustment 
would be required if this chapter were studied first, and 
many teachers may prefer to order the topics in this way. 


constant 
forms 


chapter 1 


The purposes of this chapter are to introduce the notation 
and the algebraic operations of constant forms, and to 
illustrate the sorts of mathematical ideas which are 
described by forms. This notation and these algebraic 
operations are used throughout Chapters 2-8. Readers 
who are perplexed rather than enlightened by such 
primitive physical notions as work and flow should skip 
the discussions involving these ideas. 


1.1 


One-Forms Suppose that a particle is being moved in the presence of 
a force field. Each displacement of the particle requires a 
certain amount of work, positive if the force opposes the 
displacement, negative if the force aids the displacement. 
Thus work can be considered as a rule or function assign- 
ing numbers (the amount of work) to displacements of 
the particle. To make the situation as simple as possible 
it will be assumed that the force field is constant; that is, 
it will be assumed that the direction and magnitude of the 
force do not change, either from one point in space to 
another or from one time to another. Moreover, only 
straight line displacements will be considered, to begin 
with, so that work can be considered as a function 


1 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhäuser Classics, 
DOI 10.1007/978-0-8176-8412-9_1, © Harold M. Edwards 2014 


Chapter? | ConstantForms 2 


Q = (1,0, 0) 


*The reader has doubtless seen the 
symbol dx used in other ways, 
particularly in integrals and as a 
differential. In this book dx will 
always mean a function from directed 
line segments to numbers. The 
connection with other uses of the 
symbol will be explained. 


assigning numbers (the amount of work) to directed line 
segments PQ (the displacement). 

Let A be the amount of work required for the dis- 
placement of the particle from P = (0,0,0) to Q = 
(1, 0, 0). Since the force field is constant, the amount of 
work required to go from any point P = (xo, Yo, Zo) to 
the point Q = (Xo+1, yo, Zo) with the x-coordinate 
increased by one is also equal to A; that is, A is the 
amount of work required for a unit displacement in the 
x-direction. Moreover, the amount of work required for 
a displacement of A units in the x-direction, from 
P = (Xo, Yo, Zo) to Q = (Xo+h, Yo, Zo), is HA; that is, 
going A times as far requires h times as much work. This 
holds for all values of 4 including, for example, h = —1, 
in which case the statement is that — A units of work are 
required to go back from P = (1, 0,0) to Q = (0,0, 0). 

Thus the one number A suffices to determine the 
amount of work required for any displacement in the 
x-direction. Similarly, if B, C are the amounts of work 
required for unit displacements in the y- and z-directions 
respectively, then the amount of work required for any 
displacement in the y- or z-direction can be found. Since 
every displacement is a superposition of displacements 
in the coordinate directions, the amount of work for any 
displacement can be found by addition. For example, 
displacement from P = (3,2,4) to Q = (4,5, 1) in- 
volves a displacement of 1 unit in the x-direction, 3 units 
in the y-direction, and —3 units in the z-direction, a 
total amount of work equalto A: 1 + B:3 + C: (—3). 
In general, displacement from (x1, Y1, Z1) to (Xa, Y2, Z2) 
requires an amount of work equal to A(xə— xı) + 
B(y2— y1) + C(Z2—2Z1). 

In summary, work is a function assigning numbers to 
directed line segments, which is of the form 


(1) work = A dx + B dy + C dz 


where A, B, C are numbers and where dx, dy, dz are 
functions* assigning to directed line segments PQ the 
corresponding change in x, change in y, change in z, 
respectively. A function of the type A dx + B dy + C dz 
is called a one-form (l-form). Such functions occur in 
many contexts, both mathematical and physical, of 
which work in a constant force field is only one example. 

Another example of a one-form arises in connection 
with a constant planar flow. Imagine a thin layer of fluid 
uniformly distributed over a plane and flowing at a 


1.1 


| One-Forms 


3 


constant rate, that is, flowing in such a way that the 
density and velocity of the fluid do not change either 
from one point of the plane to another or from one time 
to another. (With these assumptions the ‘fluid’ could in 
fact be a sheet of tin sliding across the plane.) Given any 
line segment in the plane, one can measure the amount of 
fluid which crosses it in unit time. In this way there is 
associated with the flow a rule or function assigning 
numbers (mass per unit time) to line segments. This 
function ‘flow across’ is not a l-form, but it becomes a 
l-form once the direction of the flow is taken into account 
as follows: The flow across a line segment can be in 
either of two directions which can be described as 
‘left-to-right’ or ‘right-to-left when left and right are 
determined by an observer standing on the plane at P and 
facing toward Q. The direction ‘left-to-right’ will be 
designated as the positive direction and the function 
‘flow across’ will be defined to be the amount of fluid 
which crosses PQ in unit time if it crosses from left to 
right, and minus the mass per unit time if it crosses from 
right to left. Thus in particular the flow across QP is 
minus the flow across PQ. 

The function defined in this way is a l-form, that is, it 
can be written 


(2) flow across = Adx + Bdy 


where A, B are numbers and where dx, dy are the func- 
tions ‘change in x’ and ‘change in y’ assigning numbers 
to directed line segments in the xy-plane. The argument 
that flow across is a 1-form can be stated as follows: If 
two line segments are parallel and oriented in the same 
direction then the ratio of the flow across these two 
segments is equal to the ratio of their lengths. In par- 
ticular, the flow across a segment in the x-direction is 
A dx where A is the flow across the segment from (0, 0) 
to (1, 0), and the flow across a segment in the y-direction 
is B dy where B is the flow across the segment from (0, 0) 
to (0, 1). To find the flow across an arbitrary line seg- 
ment from (x1, yı) to (x2, y2) consider the right triangle 
whose hypotenuse is the given segment and whose sides 
are parallel to the coordinate axes. The flow across one 
side is A dx and the flow across the other side is B dy. 
But the constancy of the flow implies that the amount of 
fluid inside the triangle is constant, hence the flow into 
the triangle across the hypotenuse is equal to the flow out 
of the triangle across the sides; hence, taking orientations 


Chapter 7 


| ConstantForms 4 


Exercises 


into account, the flow across the hypotenuse is 
A dx + B dy. 


1 Evaluate the 1-form 2dx + 3dy + 5dz on the line 
segment PQ where 


(a) P = (0, 1, 0) Q = (0,0, 1) 
(b) P = (3, 12, 4) Q = (11, 14, —7) 
(c) P= (—1, 3, —5) Q = (3, —l, —7). 


2 (a) Given that the rate of flow (of a constant flow in the 
plane) across the segment PQ is 


3 when P = (2, 1), Q = (3,1) 
1 when P = (—3, 2), Q = (—3, 3) 


find the rate of flow across PO when 
P = (3,4), Q= (0,0). 
(b) Given that the rate of flow across PQ is 


5 when P = (4, 2), Q = (6,3) 
2 when P = (—2, 1), Q = (1,3) 


find the 1-form A dx + B dy describing the flow. 


3 (a) If the flow across is given by the 1-form 3 dx — 2 dy, 
find several segments across which the flow is zero, 
then find the most general segment across which the 
flow is zero. Sketch the lines of flow. In which direc- 
tion is the flow along these lines (according to the 
orientation convention stated in the text)? 

(b) Answer the same questions for the case in which flow 
across 1s given by dx + dy. 

(c) If flow across is given by A dx + Bdy, what are the 
lines of flow? Across which points of the plane would 
a particle of the fluid which was at (0, 0) at time 0 
pass at subsequent times ? 


4 If the force field is constant and if displacement of a given 
particle 
from (0, 0, 0) to (4, 0, 0) requires 3 units of work 
from (1, —1, 0) to (1, 1, 0) requires 2 units of work, and 
from (0, 0, 0) to (3, 0, 2) requires 5 units of work 
find the 1-form describing the function ‘work’. 


5 If work is given by the 1-form 3 dx + 4dy — dz, find all 
points which can be reached from the origin (0, 0, 0) without 
work. Describe the set of these points geometrically. Describe 
the direction of the force geometrically. 


1.2 | Two-Forms 5 


Two-Forms 


R 


Q 
PQR or QRP or RPQ 
not RQP or QPR or PRQ 


| 


t 


*/n giving descriptions in xyz-space 
the direction of increasing x will be 
taken to be east (left to right), that 
of increasing y to be north (bottom 

to top of the page), and that of 
increasing z to be up (from the page 
toward the viewer). Thus the flow 
will be called ‘upward’ rather than 
in the direction of increasing z’, and 
the rotational direction (0, 0, 0) —> 

(7, 0, O) — (0, 7, O) — (0, O, O) will 

be called ‘counterclockwise’. 


6 If 3dx + 2 dy describes 
(a) flow across, draw an arrow indicating the direction of 
flow; 
(b) work, draw an arrow indicating the direction of the 
force. 


7 Show that if the work required for straight line displace- 
ments is given by the 1-form A dx + B dy + C dz then the 
work required for a displacement along a polygonal curve 
PoPiP2...P, is the same as the work required for displace- 
ment along the straight line PoP,. (This shows that there is 
no loss of generality in considering only straight line displace- 
ments in a constant force field.) 


1.2 


Consider a constant flow in xyz-space. The function 
‘flow across’, which in the case of a planar flow assigned 
numbers to oriented curves, will in this case assign 
numbers to oriented surfaces. For the sake of simplicity, 
only the flow across triangles will be considered, that is, 
flow across will be considered as a function assigning 
numbers to oriented triangles in space. An orientation 
of such a triangle, which is needed to establish the sign 
of the flow across it, can be indicated by drawing a 
rotational arrow © or oa inthe center of the triangle. 
More formally, the orientation can be described by 
naming the vertices in a particular order, thereby indi- 
cating one of the two possible rotational directions in 
which the bounding curve can be traced. The convention 
for determining the sign of the flow across an oriented 
triangle in space is to say that the flow is positive if it is 
in the direction indicated by the thumb of the right hand 
when the fingers are curled in the rotational direction 
which orients the triangle. This convention, like the 
analogous convention for planar flows, is in itself un- 
important; all that is important is the fact that the sign of 
the flow across an oriented triangle can be defined in a 
consistent way. 

Consider first a unit flow in the z-direction, that is, a 
flow in the direction of increasing z which is such that 
unit mass crosses a unit square of the xy-plane in unit 
time.* The mass of the fluid which crosses any triangle 
in the xy-plane in unit time is clearly proportional to the 
area of the triangle, and, because the mass which crosses 
a unit square in unit time is one, the constant of propor- 
tionality is one. Given any triangle in xyz-space, consider 


Chapter 1 


| Constant Forms 6 


*This notation derives from the 
formula for the area of a rectangle— 
the x-dimension times the 
y-dimension—although for other 
polygons the area is not given by a 
product. The entire symbol dx dy 
should be thought of as representing 
the function oriented area’ which Is 
not, in general, the product of dx 
and dy. 


tNote the reverse order. This is 
dictated by the orientation convention 
stated above, as the diagram 

on the opposite page indicates. 

An alternative derivation, not 
depending on a diagram, will be 
given later (Exercise 6, §7.5). 


t7he order in which these terms are 
written is, of course, a matter of 
choice. For example, every function 
A dy dz + B dz dx + C dx dy can 
also be written A’ dx dy + 

B’ dx dz + C’ dy az. The order given 
above is chosen because, aS was 
just shown, A can then be thought 
of as the x-component of the flow. B 
as the y-component, and C as the 
z-component. 


the prism generated by lines parallel to the z-axis through 
points of the triangle. It is clear that the mass of the fluid 
which crosses any triangle formed by intersecting this 
prism with a plane is the same as the mass of the fluid 
which crosses the original triangle, simply because there 
is no flow across the sides and what goes in the bottom 
must come out the top. Thus the amount of flow across 
any triangle in xyz-space is equal to the amount of flow 
across the triangle which is its projection on the xy-plane. 
This in turn is equal to the area of its projection on the 
xy-plane. Finally, taking the signs into account, the flow 
across an oriented triangle is equal to the area of its 
projection on the xy-plane if this projection is oriented 
counterclockwise, minus the area of its projection if the 
projection is oriented clockwise. This conclusion can be 
abbreviated by defining the oriented area of an oriented 
triangle in the xy-plane to be the area if its orientation Is 
counterclockwise, and to be minus the area if its orienta- 
tion 1s clockwise. Then, for a unit flow in the z-direction, 
the flow across any oriented triangle is equal to the 
oriented area of its projection onto the xy-plane. 

The function ‘oriented area’, assigning numbers to 
oriented triangles in the xy-plane, is denoted dx dy.* The 
positive rotational direction designated by dx dy can be 
described independently of pictures and notions such as 
‘clockwise’ by saying that an oriented triangle described 
by going from the origin to a point on the positive x-axis 
to a point on the positive y-axis and back to the origin is 
positively oriented. Then it is natural to use dy dx to 
denote —dx dy, that is, to denote the function ‘oriented 
area’ in which the orientation of a triangle from the 
origin to the positive y-axis to the positive x-axis to the 
origin is positive. 

Thus unit flow in the z-direction is described by the 
function dx dy assigning to oriented triangles in space the 
oriented area of their projections on the xy-plane. 
Similarly, unit flow in the x-direction is described by 
dy dz and unit flow in the y-direction by dz dx.t Writing 
an arbitrary flow as a superposition of A times the unit 
flow in the x-direction plus B times unit flow in the 
y-direction plus C times unit flow in the z-direction leads 
to the conclusion that for an arbitrary constant flow 


flow across = A dy dz + B dz dx + C dx dy 


(1) 


for some numbers A, B, C. A function of the type 
A dy dz + B dz dx + C dx dyf is called a two-form 


1.2 | Two-Forms 


x 


7 


As seen 
from z>0 


As seen 
from x>0 


As seen 
from y>0 


Exercises 


(2-form). The conclusion (1) is therefore that the function 
‘flow across’ associated with a constant flow in xyz- 
space 1s a 2-form. 

A 2-form A dydz-+ Bdzdx-+ Cdxdy is thus a 
function assigning numbers to oriented triangles in 
xyz-space. Of course it also assigns numbers to other 
oriented polygons, such as rectangles and parallelograms, 
because the notion of area is just as meaningful for 
polygons of arbitrarily many sides as it is for triangles. 
Moreover, a 2-form assigns numbers to oriented poly- 
gonal surfaces (just as a l-form assigns numbers to 
oriented polygonal curves), the value of the 2-form on 
the entire surface being defined to be the sum of its values 
on the individual pieces of the surface. The values of 
2-forms on oriented triangles will be emphasized because 
these are the simplest oriented polygonal surfaces to 
describe, but the reader should continue to think of a 
2-form as a function assigning numbers to arbitrary 
polygonal surfaces. 

The actual evaluation of 2-forms on particular poly- 
gons is less trivial than the evaluation of 1-forms. The 
technique of evaluation is described in the next section. 


1 Find the value of the 2-form dx dy + 3dxdz on the 
oriented triangle with vertices 


(0,0,0), (1,2,3), (1, 4, 0) 


in that order. [Draw pictures of the projection of the triangle 
in the relevant coordinate planes.] 


2 Find the value of the 2-form dy dz + dz dx + dx dy on 
the oriented triangle with vertices 


(1, 1, 2), (3, 5, —1), (4, 2, 1) 
in that order. 


3 Give a necessary and sufficient condition for the oriented 
area of the triangle (0,0), (x1, y1), (x2, y2) to be positive. 
[The equation yıx — x;y = 0 is the equation of the line 
through (0, 0) and (x1, y1). The sign of the number yıx — xıy 
tells which side of this line the point (x, y) lies on.] 


4 Give a necessary and sufficient condition for the oriented 
area of the triangle (x1, y1), (x2, y2), (x3, y3) to be positive. 
[Use Exercise 3.] Show that interchanging two vertices changes 
the sign of the oriented area. 


Chapter? | ConstantForms 8 


The Evaluation of Two-Forms 
Pullbacks 


The map (1) 
Carries triangles 
in the uv-plane 
to triangles in 
the xy-plane. 


*A function y = f(x), in which range 
and domain are 1-dimensional, can 
be pictured geometrically by means 

of its graph, a curve in two 

dimensions. However, the graph of a 

function in which range and domain 

are 2-dimensional is in 4-dimensional 
space. Since geometrical space Is 
only 3-dimensional, the graph 
therefore cannot be visualized. 
However, such functions can be 
visualized as mappings of one plane 
onto another, which is the origin of 
this term (see Exercise 7 ). 


1.3 


The algebraic rules which govern computations with 
forms all stem from the following fact: Let 


au + bu + c 
a'u + bv + c 


Il 


x 
Yy 


(1) 


be a function assigning to each point of the uv-plane a 
point of the xy-plane (where a, b, c, a’, b’, c’ are fixed 
numbers). A function from the uv-plane to the xy-plane 
is also called a mapping* or a map instead of a function, 
and a mapping of the simple form (1) (in which the 
expressions for x, y in terms of u, v are polynomials of 
the first degree in u, v) is called an affine mapping. Given 
an oriented polygon in the wv-plane, its image under the 
affine mapping (1) is an oriented polygon in the xy-plane. 
For example, the oriented triangle with vertices (uo, Vo), 
(u1,U1), (u2, Vg) is carried by the mapping (1) to the 
oriented triangle (xo, Yo), (X1, Y1), (X2, Y2) where 


Xo = augo + bvo + c 
Yo a'uo + b’v9 + c 


and similarly for x1, Y1, X2, Yo. It will be shown that the 
oriented area of the image of any oriented polygon under 
the map (1) is ab’ — a’b times the oriented area of the 
polygon itself. That is, the map (1) ‘multiplies oriented 
areas by ab’ — a'b’, a fact which is conveniently 
summarized by the formula 


(2) dx dy = (ab'—a'b) du dv. 


Examples 


The affine mapping 


X = —D 


y 


u 


carries (0, 0) to (0, 0), (1, 0) to (0, 1), (0, 1) to (— 1, 0), 
(3, 2) to (—2, 3), etc., and can be visualized as a rotation 
of 90° in the counterclockwise direction. For this map- 
ping ab’ — a'b is 0:0 — (1)(—1) = 1, hence dx dy = 
du dv, that is, the oriented area of the image of any 
polygon is the same as its oriented area. In short, a 


1.3. | The Evaluation of Two-Forms. Pullbacks 9 


rotation of 90° does not change oriented areas. 
The affine mapping 


x= —u 
you 
can be visualized as a reflection. In this case ab’ — 
a'b = —1; hence, dx dy = —du dv. This says that the 
reflection leaves areas unchanged but reverses orienta- 


tions, a fact which is clear geometrically. 
The affine mapping 


x= u+ 3v 
y=v 


can be visualized as a shear. In this case dx dy = du dv, 
1.e. areas and orientations are unchanged by the mapping, 
which agrees with geometrical intuition. 

The affine mapping 


x = 3u 
Yy=0U 


changes lengths in the horizontal direction by a scale 
factor of 3 and leaves lengths in the vertical direction 
unchanged. Here dx dy = 3 du dv; that is, orientations 
are unchanged and areas are trebled. 

An affine mapping 


X 


u + c 
y= v+ | 


is a translation. In this case dx dy = du dv, that is, 
oriented areas are unchanged. 
The affine mapping 


XxX = u 
y=0 


collapses the entire uv-plane to the x-axis. Thus the 
image of any polygon is a ‘polygon’ of zero area. This 
agrees with the formula dx dy = 0- du dv given by (2). 

A rigorous proof of formula (2) would require a 
rigorous definition of area as a double integral. In- 
formally, however, the plausibility of the formula is 
easily deduced from the intuitive idea of area as follows: 


Chapter? | ConstantForms 10 


The map (1) 
carries the unit 
square in the uv- 
plane to a parallel- 
ogram in the 
xy-plane. 


*This is true only roughly because 
this number is defined only roughly— 
some squares neither lie in PQR nor 
lie outside POR but lie part way in 
and part way out. The number of 
such borderline cases is insignificant 
relative to the number of squares 
inside. 


t/fA = 0 then the map (7) collapses 
the uv-plane to a single line in the 
xy-plane. Therefore all polygons are 
collapsed to figures with zero area, 
and dx dy = A du av = O holds. 


It is clear that for any given affine mapping (1) there is 
some number, say A, such that dx dy = A du dv—that 
is, such that the oriented area of the image in the xy- 
plane of any oriented polygon in the uwv-plane is A times 
its oriented area. To reach this conclusion, consider first 
the unit square in the uv-plane with vertices (0, 0), 
(1, 0), (1, 1), (0, 1) in that order. Its image under the map 
(1) is an oriented parallelogram in the xy-plane. Let A be 
the oriented area of this parallelogram. It is to be shown 
that if POR is any oriented triangle in the wv-plane then 
the oriented area of the image of PQR is A times the 
oriented area of POR itself. Imagine the uv-plane to be 
ruled off into very small squares, say by the lines 


__ integer 


ser _ integer 
1,000 


1,000 


dividing it into squares which are (1,000)~' on a side. 
Then the area of POR is roughly equal to (1,000)~” 
times the number of squares which lie in PQR.* Now the 
image of this ruling of the uv-plane is a ruling of the 
xy-plane into small parallelograms. These small parallelo- 
grams all have the same area and, since (1,000)? of them 
make up the parallelogram which is the image of the unit 
square, this area must be | A| - (1,000) ”, where |A] is the 
absolute value of A. The image of the triangle PQR 
contains roughly (area POR) X (1,000)* of these small 
parallelograms, each of area |A| X (1,000)~*; hence the 
area of the image of PQR is |A| times the area of POR. 
It is easy to see that if A > 0 then the map preserves all 
orientations (carries clockwise triangles to clockwise 
triangles and counterclockwise to counterclockwise) 
whereas if A < 0 then the map reverses all orientations.f 
Thus the oriented area of the image of POR is A times 
the oriented area of POR as was to be shown. 

Now it must be shown that A is given by the formula 
A = ab’ — a'b. This formula is certainly correct for the 
simple affine maps considered in the examples above. 
The general case will follow from these particular cases if 
it is shown that 


(i) if (2) is true of two affine maps then it is true of 
their composition, and 

(ii) every affine mapping is a composition of re- 
flections, rotations by 90°, shears, translations, 
and multiplications of the coordinate directions 
by scale factors. 


1.3 | The Evaluation of Two-Forms. Pullbacks 11 


Statement (ii) is plausible geometrically and is not dif- 
ficult to prove algebraically (Exercise 8). Statement (i) 
can be verified as follows: 

Let two affine maps 


u=ar+tPs+y x = au + bv + c 

v= ar +4 8s HY y=dadu+bvu+ | 
be given. Assuming that the first map multiplies oriented 
areas by af’ — a'ß and that the second multiplies 
oriented areas by ab’ — a’b, it follows that the composed 
map multiplies oriented areas by (a8’—a’B)(ab’—a’b). 
On the other hand, the composed map is given explicitly 
as a map of the rs-plane to the xy-plane by 


x = a(art+Bst+7) + b(a’r+B's+7’) + c 
y = dort Bst+7) + b'(a'r+B's+7') + o 


that is 


x = (aatba')r + (aB+bB’)s + (aY+bY'+c) 
y = (a’a+b'a')r + (a’B+b'B’)s + (wytb +c’). 


Thus the statement to be proved is 


(ab’ —a’b)(aB’—a’B) = (ada+ba’)(a’B+5'B’) — (a’a+b’a’)(aB-+ bp’) 


*The letter d represents the operation, 
here purely formal and algebraic, of 
taking a differential. 


which is easily verified. 

There is no need to memorize formula (2) because it 
can be derived immediately from the formal algebraic 
rules* 


du du = 0, dv dv = 0, du dv = — dv du 
d(au+bv+c) = a du + b dv 
d(a’u+b’v+c’) = a’ du + b' dv 


which give 


dx dy = (a du+b dv)(a’ du+b'dv) 
aa’ du du + ab’ du dv + ba’ dv du + bb’ dv dv 
= (ab'— a'b) du dv. 


The 2-form (ab’—a’b) du dv is called ‘the pullback of 
the 2-form dx dy under the affine map (1y. The name 
‘pullback’ derives from the fact that the map goes from 
the uv-plane to the xy-plane while the 2-form dx dy on 
the xy-plane ‘pulls back’ to a 2-form on the wv-plane; 


Chapter 7 


| Constant Forms 


12 


that is, the pullback goes in the direction (xy to uv) 
opposite to the direction of the map itself (uv to xy). 

The pullback should be considered to be defined 
by the algebraic rules dudu = 0, dudv = —dv du, 
d(au+-bv-+c) = adu + bdv, etc., and not by the 
formula (2). Then the same rules serve to define the 
pullback of a 2-form A dy dz + B dz dx + C dx dy 
under an affine map 


x= au+ bou + c 


a'u + b’v + c 
au + b’’v -+ cl’ 


Z 


of the uv-plane to xyz-space. (A 2-form in xyz pulls 
back under the map to give a 2-form in uv.) One merely 
performs the substitution and applies the algebraic rules 
to obtain 


A(a’ du+b' dv)(a" du+b” dv) + B(a” du+b” dv)(a du+b dv) + C(adut+b dv)(a' du+b' dv) 


= Aa’a" du du + Aa'b" du dv + a 
= [A(a’b"—a''b’')+ B(a''b—ab"')+ C(ab'—a’'b)] du dv. 


The justification of these formal rules of computation— 
which appear quite mysterious at first—is completely 
pragmatic. They are simple to apply and they give very 
quickly the solution to the problem of evaluating 2-forms. 
For example: In order to find the value of the 2-form 
dy dz — 2dx dy on the oriented triangle whose vertices 
are (1,0, 1), (2, 4, 1), (—1, 2, 0) one can first write this 
triangle as the image of the oriented triangle (0, 0), (1, 0), 
(0, 1) in the uv-plane under the affine mapping 


x= |+ u— 2v 
y 4u + 2v 
z= 1 — OD. 


The value of dy dz on this triangle is the oriented area of 
its projection on the yz-plane, that is, the oriented area 
of the image of (0, 0), (1, 0), (0, 1) under 


y= 4u + 2v 
z= |1 — v 


which is the value of (4 du+2 dv)(—dv) = —4 du dv on 
(0, 0), (1, 0), (0, 1). Since the oriented area du dv of this 


1.3 | The Evaluation of Two-Forms. Pullbacks 13 


dy dz — 


Exercises 


triangle is 4, the answer is —2. Similarly the value of 
—2 dx dy is found by 


—2 dx dy = —2(du—2 dv)(4 du+2 dv) 
—2(2 du dv—8 dv du) 
— 20 du dv 


hence the value on the triangle is — 10. Altogether the 
value of dy dz — 2 dx dy on the given triangle is there- 
fore —12. The computation is best done all at once by 
writing 


2 dx dy = (4 du+2 dv)(—dv) — 2(du—2 dv)(4 du+2 dv) 
—4 du dv — 4dudv + 16 dv du 


— 24 du dv 


so that the value on the triangle (0, 0), (1, 0), (0, 1) is 
— 12. 

The evaluation of an arbitrary 2-form A dy dz + 
B dz dx + Cdxdy on an arbitrary oriented triangle 
(xo; Yos Zo) (X1, Y1, Z1), (X2, Y2, Z2) can be accom- 
plished in the same way using the computational rules 
for finding pullbacks. In the following chapters it is these 
computational rules du du = 0, du dv = —dv du, and 
d(au+bv+c) = a du + b dv, which are of primary im- 
portance. Their use in the evaluation of 2-forms is only 
one of their many applications. 


1 Redo Exercises 1, 2, 3 of §1.2 using the techniques of this 
section. 


2 Find the oriented area of the triangle POR (oriented by 
this order of the vertices) in each of the following cases by (a) 
drawing the triangle and using geometry, and (b) using the 
techniques of this section. 


P = (0, 0) Q=(,2) R= (2,0) 

P = (1,1) Q =(,1) R= (2,3) 

P = (0,0) Q=(2,1) R= (1,2) 
3 Evaluate the 2-form 3 dy dz + 2dx dy on the triangle 
POR where 


(a) P= (3, 1, 4) Q = (—2, 1, 4) R 
(b) P = (0,0,0) Q = (1,2,1) R= 


and on the parallelogram PORS where 
P = (1, 5,3), Q = (2,7,6), R = (8, 12, 10), S = (7, 10, 7). 


Chapter 7 


| Constant Forms 


14 


P 


4 (a) Find the formula for the oriented area of a triangle 
whose vertices are 


P=(0,0) Q= (xı, y1) R= (x2, y2) 


in that order. 
(b) Find the oriented area of the triangle POR when 


P = (xo, yo) Q = (xı, y1) R= (x2, y2). 


Write the answer as the sum of three similar terms. 
(c) For a unit flow in the z-direction find the total flow 
into the tetrahedron with vertices 


= (xo, Y0, 0), Q = (xı, Yis, 0), R = (x2, Y2, 0), S = (0, 0, 1), 


that is, find the flow across each of the four sides, 
orienting them appropriately, and add. Relate the 
answer to (b). 

(d) If S is a point inside the triangle POR then 


area (PQR) = area (PQS) + area (QRS) + area (RPS). 


Is the same true for S outside the triangle? What if 
oriented area is used instead of area? Draw pictures 
illustrating a few examples. 


5 (a) Continuing Exercise 4, find the formula for the 
oriented area of a quadrilateral with vertices 


P = (xo, yo), Q = (x1, y1), R = (x2, y2), S = (x3, y3) 


[Write the quadrilateral as a pair of oriented triangles 
joined along a side.] 

(b) Show that the formula for the oriented area of an 
n-gon is a sum of n similar terms, one for each side of 
the n-gon. 

(c) A closed, oriented, polygonal surface in space is a set 
of oriented polygons with the property that the 
boundary cancels; that is, every oriented line seg- 
mented PQ which occurs in the boundary of one 
polygon occurs with the opposite orientation QP as a 
part of the boundary of another polygon (in the same 
way that a closed oriented polygonal curve is a collec- 
tion of oriented line segments with the property that 
every point which is the beginning point of one line 
segment is the end point of another segment), or, 
more precisely, PQ occurs with the same multiplicity 
as OP. Using (b), show that the total value of dx dy 
(unit flow in the z-direction) on any closed, oriented 
polygonal surface is zero. 

(d) Generalize (c) to show that the total value of any 2- 
form on any closed, oriented polygonal surface is zero. 

(e) Generalizing Exercise 7, §1.1, show that the flow 


1.4 | Three-Forms 


15 


Three-Forms 


across an oriented polygonal surface depends only on 
the oriented curve which is its boundary. 


6 Given that flow across is described by the 2-form 
3 dy dz — 1 dz dx + 11 dx dy, what, based on the derivation 
of §2.2, would be the flow vector; that is, what is the direc- 
tion of flow? Verify your guess by showing that the flow 
across any parallelogram, one side of which has this direction, 
is zero. 


7 Let 
= 2 + 3x- y 
v= 1] — 4x+y 


= 
| 


be an affine mapping of the xy-plane to the uv-plane. 

(a) Give a picture of this ‘mapping’ by drawing the lines 
u=...,—2,—-1,0,1,2,...andv =..., —2, —1, 
0,1, 2,...1in the xy-plane. 

(b) In the same way draw the lines x = const., y = const. 
in the uv-plane. 

(c) Draw a few triangles in the xy-plane and show their 
images in the uv-plane. 

(d) Judging from the drawings, would you say that the 
mapping preserves orientations or reverses them? 

(e) Find the pullback of du dv. Relate the answer to 
part (d). 


8 Show that every affine map can be written as a composi- 
tion of the simple types listed in the text. [The map (1) can be 
so written if every such map in which c = c’ = 0 can be so 
written. The map x = v, y = u can be so written. The map 
(1) with c = c’ = O can be so written if either of the maps 


x = bu + av r x = a'u + b'v 

y = b'u + a'v y = au + bv 
can be so written. The map x = 0, y = 0 can be so written. 
Therefore one can assume a =Æ 0. Therefore one can assume 
a = 1. Therefore one can assume a’ = 0. Therefore one can 


assume b = 0. Thus it suffices to show that the map x = u, 
y = b'v can be so written, which is obviously the case.] 


1.4 


A three-form (3-form) on xyz-space is an expression of 
the form A dx dy dz where A is a number. The pullback 
of a 3-form A dx dy dz under an affine map 


= au + bvo + cw +e 
a'u + b'vu + dcw + e 
z = a'u + b'o + cw + el! 


(1) 


Chapter 1 


| Constant Forms 


16 


of uvw-space to xyz-space is defined to be the 3-form 
B du dv dw found by performing the substitutions 


dx = a du 4+ b dv + c dw 
dy = a' du + b' dv +c’ dw 
dz = a” du + b” dv + c” dw 


and multiplying out, using the distributive law of multi- 
plication and the formal rules dudu = 0, dudv = 
—dv du, etc., to reduce the resulting expression to the 
form B du dv dw. 

When carried out, the substitution gives 


A dx dy dz = A(adu+b dv+c dw)(a’ dut+b’ dvu+c’ dw)(a” du+ b” dv+c" dw) 


= 
| 


Aaa'a” du du du + Aaa'b” du du dv + ::: (27 terms) 
Aab'c” du dv dw + Aac’ b” du dw dv + --: (6 terms) 
A(ab’c’’+bc'a’’+ca'b"” — cb'a"’ — ba'c'’—ac'b’’) du dv dw. 


However, in most cases it is as easy to carry out the 
substitution directly as it is to use this formula. 

The 3-form dx dy dz can be interpreted as the function 
‘oriented volume’ assigning numbers to oriented three- 
dimensional figures in xyz-space in the same way that 
dx dy is the oriented area of two-dimensional figures in 
the xy-plane and in the same way that dx 1s the oriented 
length of intervals of the x-axis. The principal fact about 
3-forms which is needed to establish the plausibility of 
the interpretation of dx dy dz as oriented volume is the 
following purely algebraic statement. 


Theorem 


Let two affine maps 


= ar + Bs+ yt+rt x= au + b+cecw+e 
= a'r +4 B's +H tt U and y 
= a'r + B's + y”t + e” Z 


a'u + b'vu + cwt e 
a'u -+ bv + cw -+ e” 


be given. Then the pullback of dx dy dz under the com- 
posed map (of rst-space to xyz-space) is equal to the 
pullback under the first map (of rst-space to uvw-space) 
of the pullback under the second map (of uvw-space to 
xyz-space) of dx dy dz. In short, the pullback under a 
composed map is equal to the pullback of the pullback. 


Computationally the theorem states that the 3-form 
dx dy dz can be ‘expressed in terms of rs?’ either by first 


1.4 | Three-Forms 17 


P 1 
Orientations agree 


*Because when coordinate axes are 
drawn as indicated in §7.2 the 
orientation (0, O, 0), (7, 0, O), 

(0, 7, 0), (0, O, 1) agrees with the 
orientation PoPiP2P3 where Po is 
the base of the thumb, Pı the tip of 
the thumb, P2 the tip of the index 
finger, and P3 the tip of the third 
finger of a right hand held in the 
natural position so that these paints 
are non-coplanar. 


expressing xyz directly in terms of rst and then forming 
the pullback or by first expressing dx dy dz as a 3-form 
in uvw and then expressing this 3-form as a 3-form in rst. 

This theorem can be proved, in the same way that the 
analogous theorem for 2-forms was proved in §1.3, by 
writing out the computations explicitly in terms of the 18 
coefficients a, b, c, a’,..., 8”, Y” and showing that the 
two 3-forms in dr dsdt which result are algebraically 
identical. These computations are quite long and can be 
avoided by examining more carefully the nature of the 
algebraic rules by which pullbacks are found. A simple 
proof based on such an examination of the algebraic 
rules is given in §4.2. For the moment it is the practical 
application of the computational rules to specific ex- 
amples which should be emphasized. The proof of the 
theorem will therefore be postponed to Chapter 4. 

In order to show that dx dy dz can be interpreted as 
oriented volume it is necessary to have an intuitive idea 
of how 3-dimensional figures can be oriented. To see 
how this is done it is useful to reformulate the idea of the 
orientation of 2-dimensional figures as follows: An 
orientation of a plane can be specified by giving three 
non-collinear points PoPıPa. Two orientations PoP P2, 
PPP, are said to agree if the points PoọPıPa can be 
moved to PPP} in such a way that throughout the 
motion the three points remain non-collinear. Otherwise 
the orientations are said to be opposite. Then it is 
geometrically plausible that the orientations PoP Pa and 
P,P oP2 are opposite (do not agree) and that every 
orientation agrees either with PoPPa or with PyPoPo. 
Thus all orientations PọP{P; are divided into two classes 
by PoP,P.2—those which agree with PoP;P. and those 
which agree with P,P oP». In the xy-plane these classes 
are called clockwise and counterclockwise—the counter- 
clockwise orientations being those which agree with the 
orientation (0, 0), (1, 0), (0, 1). 

In the same way, an orientation of space can be de- 
scribed by giving four non-coplanar points PoP;PoP3. 
Two orientations PoP,P2P3 and PoP;P5P3 agree if the 
points of one can be moved to the points of the other 
keeping them non-coplanar all the while. All orienta- 
tions fall into two classes such that two orientations in 
the same class agree. In xyz-space these classes are called 
left-handed and right-handed, an orientation being called 
right-handed if it agrees with the orientation (0, 0, 0), 
(1, 0, 0), (0, 1, 0), (0, 0, 1).* 


Chapter 7 


| Constant Forms 18 


Exercises 


With this definition of orientation the 3-form dx dy dz 
can be described as the function ‘oriented volume’ assign- 
ing to oriented solids in xyz-space the number which is 
the volume of the solid if its orientation is right-handed 
and which is minus the volume if its orientation is left- 
handed. If the solid consists of several oriented pieces 
then the oriented volume of the whole is defined to be the 
sum of the oriented volumes of the pieces. 

The notion of ‘pullback’ can then be described as 
follows: Let A dx dy dz be given and let a map (1) be 
given. The map (1) carries oriented solids in uvw-space 
to oriented solids in xyz-space. The 3-form A dx dy dz 
assigns numbers to oriented solids in xyz-space—namely, 
A times the oriented volume. The composition of these 
two operations assigns numbers to oriented solids in 
uvw-space. This new rule assigning numbers to oriented 
solids in uvw-space is clearly proportional to oriented 
volume; that is, it is of the form Bdu dv dw for some 
number B. This geometrically defined 3-form B du dv dw 
is in fact identical with the pullback of A dx dy dz under 
the map (1) as defined algebraically above. 

As in §1.3, the plausibility of the assertion that the 
3-form B du dv dw can be found by applying the alge- 
braic rules which define the pullback operation, can be 
established by decomposing the given affine map into a 
sequence of simple operations for which the algebraic 
rules clearly give the correct answer—operations such as 
rotation by 90° around a coordinate axis, reflections in 
coordinate planes, shears, scale factors, and translations. 
The rigorous proof, which must await the rigorous 
definition of ‘oriented volume’, is given in Chapter 6. 
Meanwhile this fact gives a very useful conceptual inter- 
pretation of the algebraic operations by which pullbacks 
are defined. This conceptual interpretation will frequently 
be used to explain ideas, but it will not be used to define 
concepts or to prove theorems until Chapter 6. 


1 Show that the pullback of dx dy dz under the composition 
of the maps 

u =r + 3s x = 2u +v 

v= 2s + t y = 3u +v 

w=r— s+t z= u+v+w 


is equal to the pullback of the pullback; that is, verify the 
theorem for this case. 


1.5 | Summary 


19 


Summary 


2 Do the same for the composition of the maps 
u=10r—7s+4+ t x= 2u+ v+ w 


3r+5s-— t y=13u- v+ w 
w= r+ s+2t z = —u + Ww — 2w. 


Ii 


3 For each of the following triples of points in the plane 
determine whether they are collinear or determine a counter- 
clockwise orientation or determine a clockwise orientation. 


(a) (0, 0), (2, 1), (4, 3) (b) (3, 4), (7, —5), (1, 2) 
(c) (—1, 3), (3, 5), (7, 7) (d) (7, 1), (7, 2), (-1, 3). 


[Write each triple as the image of the standard triple (0, 0), 
(1,0), (0, 1) under an affine map and find the effect of the 
map on oriented areas. Verify the answers by drawing 
pictures. ] 


4 For each of the following quadruples of points in space, 
determine whether they are coplanar or determine a right- 
handed orientation or determine a left-handed orientation. 


(a) (0, 0, 0), (2, 1, 4), G, 4, 7), Q, 2, 9) 
(b) (3, —1, —2), (—1, 0, 1), (0, 1, 1), (4, 2, —2) 
(c) (2, 1, 4), (7, 9, 6), (3, 5, 6), (2, 4, 6). 


5 Show that the oriented volume of the tetrahedron (0, 0, 0), 
(1, 0, 0), (0, 1, 0), (0, 0, 1) is the same as that of the tetra- 
hedron (0, 0, 0), (1, 0, 0), (1, 1, 0), (1, 1, 1). (A tetrahedron is 
oriented by its vertices in the obvious way.) Show that this 
volume is 1/6 by showing that the unit cube {0 < x < 1, 
0<y<1,0<z < 1} can be divided into 6 such tetra- 
hedra. 


6 Find the oriented volume of the tetrahedra in Exercise 4. 


7 How is the orientation of a line described ? 


1.5 


A 1-form in the three variables x, y, z is an expression of 
the form A dx + Bdy + C dz where A, B, C are num- 
bers and dx, dy, dz are symbols. A 2-form in x, y, zis an 
expression of the form A dy dz + B dz dx + C dx dyin 
which A, B, C are numbers and dy dz, dz dx, dx dy are 
symbols. A 3-form in x, y, z is an expression of the form 
A dx dy dz where A is a number and dx dy dz is a symbol. 

The symbol dx should be thought of as representing 
‘oriented length of the projection on the x-axis’, a 
function assigning numbers to oriented curves in space. 
The symbols dy, dz have analogous interpretations. The 


Chapter 7 


| Constant Forms 


20 


symbol dx dy should be thought of as representing 
‘oriented area of the projection on the xy-plane’, a 
function assigning numbers to oriented surfaces in space. 
The symbols dy dz, dz dx have analogous interpretations. 
The symbol dx dy dz should be thought of as representing 
‘oriented volume’, a function assigning numbers to 
oriented solids in space. 

Given a k-form (k=1, 2, or 3) in the variables x, y, z 
and given an affine mapping 


x = au + bov + cw +e 
a'u + bv + cw + e 
z= a''u + bv + cw + e" 


Il 


(1) 


there is a k-form in u, v, w called the pullback of the given 
k-form under the given affine map, defined by the com- 
putational rules dx = d(au + bv + cw + e) = adu + 
b dv + c dw, du du = 0, du dv = — dv du, etc. 

In the same way the computational rules define the 
pullback of a k-form in m variables under an affine 
mapping which expresses the m variables in terms of n 
other variables. Only cases in which m, n are less than or 
equal to 3 and k is less than or equal to m were con- 
sidered, but the method of computation extends im- 
mediately. 

Geometrically, the pullback operation can be inter- 
preted as a composed function. For example, if A dy dz + 
B dz dx + C dx dy is a 2-form on xyz-space and if (1) 
is an affine map of uvw-space to xyz-space then the rule 
‘evaluate A dy dz +- B dz dx + Cdxdy on the image 
under (1) assigns numbers to oriented surfaces in 
uvw-space. This function is a 2-form on wuw-space, 
namely, the 2-form obtained from the given 2-form and 
the given affine map by forming the pullback. Other 
pullbacks have analogous interpretations. The connec- 
tion between this geometrical interpretation of pullback 
and its actual algebraic definition has been indicated by 
plausibility arguments. A rigorous statement and proof 
are given in Chapter 6. 

In addition to the functions ‘oriented length’, ‘oriented 
area’, and ‘oriented volume’, examples of forms are 
provided by the functions ‘work required for displace- 
ments in a constant force field’, which is a l-form, and 
‘rate of flow of a constantly flowing fluid across oriented 
surfaces in space’, which is a 2-form. 


1.5 | Summary 


21 


Exercises 


1 Under the affine mapping 
x = 2 + 3u + 40 
y = 1 + 2u — 3v 
7 — 5u + 2v 


Z 


find the pullbacks of dx, 3 dx + 2 dy — 2 dz, dx + dy + dz, 
3 dx dy, 8 dy dz + 3 dz dx + dx dy. 


2 Under the affine mapping 


u = 7 — 3x + 4y + 122 
v= x+ yr z 


find the pullbacks of 2 du + 3 dv, du + 3 dv, du dv, 4 du dv, 
3 du dv. 


3 If work in a constant force field is given by 3dx — 
2 dy + 2 dz and if x = 3t, y = t,z = 4 + 3t is the position 
(x, y, z) of the particle at time ¢, how much work must be 
done during the time intervalO < t < 3? DuringO < t < 2? 
During —10 < t < —8? Is the function ‘work per time 
interval’ a 1-form in this case? Describe this as a pullback. 


4 A 1-form in 3 variables has 3 components, a 2-form 3 
components, and a 3-form 1 component. How many com- 
ponents does a 1-form in 4 variables have? A 2-form? A 
3-form? A 4-form? How many components does a k-form in 
n variables have? [How many ‘k-dimensional coordinate 
planes’ are there in an n-dimensional space ?] 


5 A natural way to describe a constant flow in the plane is 
by saying that in time ¢ the point (x, y) moves to (x + Af, 
y + Bt) where A, B are the x- and y-components of the 
constant flow. Assuming the fluid has unit density find the 
1-form which describes this planar flow. [The fluid crossing a 
given line segment in unit time is contained in a certain 
parallelogram. ] 


6 Let (x + tA, y + tB,z + tC) describe a flow in space. 
As in Exercise 5, assume the fluid has unit density and find 
the 2-form which describes this flow. 


integrals 


chapter 2 


2.1 
Non-Constant Forms In Chapter 1 a constant force field is described by a 
1-form 
A dx + Bdy+ C daz, 
where 


A = work required for unit displacement in x-direction 
B = work required for unit displacement in y-direction 
C = work required for unit displacement in z-direction. 


In a force field in which the force depends on the location 
(x, y, Z) the quantities 4, B, C depend on x, y, z, that is, 
A = A(x, y, Z), B = B(x, y, z), C = C(x, y, z). The ex- 
pression 


(1) A(x, y, z) dx + B(x, y, z) dy + C(x, y, z) dz 


can then be regarded as assigning to each point (x, y, Z) 
of space the 1-form which describes the force field at that 
point. 

Similarly, a non-constant flow is described by an ex- 
pression of the form 


(2) A(x, y,z)dydz+ B(x, y, z)dzdx + C(x, y, z) dx dy 


22 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhauser Classics, 
DOI 10.1007/978-0-8176-8412-9_2, © Harold M. Edwards 2014 


2.1 | Non-Constant Forms 23 


*The mathematical terminology is 
unfortunately ambiguous, the term 
form’ referring both to (variable) 
forms and to constant forms. [The 
terminology of physics makes a very 
useful distinction between vectors 
and vector fields which is exactly the 
distinction between constant forms 
and forms. A (variable) form is a 
certain kind of tensor field (namely a 
field of alternating covariant tensors) 
and physicists may prefer to think of 
forms as alternating covariant tensor 
fields.| Since constant forms are the 
exception rather than the rule, it is 
only reasonable to use the shortest 
term possible (‘form’) for the idea 
which occurs most frequently and to 
use the longer term (‘constant form’) 
for the exceptional cases. 


Exercises 


assigning to each point (x, y, Z) of space the 2-form which 
describes the flow at that point. 

Henceforth an expression such as (1) will be called a 
l-form on xyz-space, and what was called a 1-form in 
Chapter 1 will be called a constant 1-form (a 1-form in 
which the functions A, B, C are constant). Similarly, a 
2-form on xyz-space will mean an expression of the form 
(2), a I-form on the xy-plane will mean an expression of 
the form 


A(x, y) dx + B(x, y) dy, 


a 3-form on xyz-space will mean an expression of the 
form 
A(x, y, Z) dx dy dz, 


and so forth. If there is danger of confusion the term 
‘variable’ will be used in parentheses—e.g. (variable) 
l-form, (variable) 2-form—to distinguish 1-forms from 
constant l-forms, but henceforth ‘l-form’ will always 
mean ‘(variable) 1-form’.* 


1 The central force field. Newton’s law of gravitational 
attraction states that the force exerted by a massive body (the 
sun) fixed at the origin (0,0,0) on a particle in space is 
directed toward the sun and has magnitude proportional to 
the inverse square of the distance to the sun. Show that this 
means that the force field is described by the 1-form 


kx k kz 

<= dx + = dy + dz 

r r r 

where k is a positive constant and r = r(x,y,z) = 
V/x2 + y2 + z2; that is, where 


A(x, y, Z) = kx/(x? + y? + 27)3/?, ete. 


[Show that for each fixed point (x, y, z) this 1-form represents 
a constant force which has the right direction (see Exercise 6, 
§1.1). Then use the fact that the magnitude of a force is 
measured by the amount of work required per unit displace- 
ment in the direction opposing the force.] 


2 Flow from a source (planar flow). Find the 1-form de- 
scribing a planar flow from a source at (0, 0), assuming the 
flow is outward at all points (x, y) and has magnitude in- 
versely proportional to the radius r = r(x, y) = Vx2 + y2 
(so that the flow is radially symmetric and the total flow 


Chapter 2 | Integrals 


24 


Integration 


across a circle of radius r about the origin is independent of r; 
that is, so there is no source between two circles of different 
radius). The magnitude of a flow is measured by the rate of 
flow across a line perpendicular to the direction of flow. 
Sketch the flow vectors. 


3 Flow from a source (spatial flow). Find the 2-form de- 
scribing flow in space from a source at (0, 0, 0), again assum- 
ing that the flow is outward at all points with a magnitude 
depending only on r, and assuming that there are no sources 
between spheres about the origin. [The surface area of the 
sphere of radius r is 4rr?.] 


4 Linear flows. A linear flow is described by a (variable) 
0-form assigning numbers to oriented points. Find the 0-form 
describing flow from a source at 0 on the line. How would a 
0-form be described in general? What is a constant 0-form? 


2.2 


If the force field is not constant, then finding the amount 
of work required for a given displacement requires a 
process of integration. The essential idea is that the 1-form 


(1) A(x, y, z)dx + B(x, y, z) dy + C(x, y, z) dz 


which describes the force field gives the approximate 
amount of work required for small displacements. The 
amount of work required for a displacement which is not 
small can then be described as a limit of sums of values 
of (1). 

It will be assumed that the force field depends con- 
tinuously on (x, y, Z); that is, it will be assumed that for 
any given point P = (x, F, Z), the values of the functions 
A, B, C at points near P differ only slightly from their 
values at P. This means that throughout a small neighbor- 
hood of P, say the cube {|x — x| < 6, |y — F| < ô, 
Iz — Z| < 6}, the force field (1) is practically equal to 
the constant force field 


A(X, ¥ ,Z) dx + B(X, 7 ,Z) dy + C(x, ¥, Z) dz 


so that the work required for a displacement QR inside 
this neighborhood is practically equal to the value of this 
constant 1-form on the oriented line segment QR. 

This relationship between the 1-form (1) and work is 
expressed by writing 


(2) work required for small displacements ~ A(x, y, z) dx + B(x, y, z) dy + C(x, y, Z) dz. 


2.2 | Integration 25 


[Read ~ as ‘is approximately equal to’.] Thus, if the 
points P = (X, 7, Z), Q = (X1, Y1, 21), R = (X2, Y2, Z2) 
are close together, then the amount of work required to 
go from Q to R is approximately equal to 


A(x, Y, Z)(x2 ~ X1) + B(x, Ys Z)(y2 ~~ yı) + C(x, Y, Z)(Z2 — Z1), 


*/t is precisely in this context, of 
course, that dx, dy, dz are thought of 
as being ‘infinitesimals’. The point of 

view taken here, however, is that 

dx, dy, dz are functions assigning 
numbers to directed line segments. 
What ts ‘infinitesimal’, then, is the 
line segment on which they are 
evaluated. Instead of saying that (2) 
holds for small line segments, with 
the approximation improving for 
shorter line segments, it is often said 
simply that work required for 
infinitesimal’ displacements = 

A(x, y, z) dx + B(x, y, z) dy + 
C(x, y, z) dz. 


and the closer together they are, the better the approxima- 
tion.* 

If S is any oriented curve, then an approximation to 
the amount of work required for the displacement S is 
found as follows: Approximate S by an oriented po- 
lygonal curve consisting of short straight-line displace- 
ments. The approximate amount of work required for 
each of these is found from (2), and the amount required 
for S is approximately equal to the sum of these values. 
The number found in this way is called an approximating 
sum; there are two approximations involved in the pro- 
cess: first, the approximation of the curve S by a po- 
lygonal curve, and second, the approximation of the 
amount of work required for each segment of the polyg- 
onal curve by (2). Since both approximations can be 
improved by taking the polygonal approximation to S to 
consist of more and shorter segments, it would be ex- 
pected that the approximating sum could be made 
arbitrarily close to the true value by refining the ap- 
proximation in this way. That is, the amount of work 
required for the displacement S should be equal to the 
limiting value of the approximating sums as the approxi- 
mating curve is taken to consist of more and shorter 
segments fitting the curve S more and more closely. This 
limiting value (if it exists) is called the integral of the 
1-form (1) over the curve S and is denoted 


J. (A(x, y, z) dx + B(x, y, z) dy + C(x, y, z) dz) 


or simply 


[ Ads + B dy + C dz), 


where it is understood that 4, B, C are functions of x, 
y, z. 

Similarly, let a flow in space be described by a (variable) 
2-form A dy dz + B dz dx + C dx dy. Then, again as- 
suming that A, B, C are continuous, the flow across a 


Chapter2 | Integrals 26 


small polygon near P = (X, F, Z) is approximately equal 
to the value on this polygon of the constant 2-form 


A(X, Y, Z) dy dz + B(X, F, Z) dz dx + C(x, 7, Z) dx dy. 


This is summarized by saying that 


(3) flow across small polygons ~ A(x, y, z) dy dz + B(x, y, z) dz dx + C(x, y, z) dx dy. 


Then for an arbitrary oriented surface S the flow across 
Sis equal to the limit of approximating sums obtained by 
constructing a polygonal approximation to S in which 
all polygons are small, using (3) to find the approximate 
rate of flow across each polygon, and adding. The limit 
of the approximating sums (if it exists) is called the 
integral of the 2-form over S and is denoted 


| amet B dz dx + C dx dy). 
S 


In general, an integral is formed from an integrand, 
which is a l-form, 2-form, or 3-form, and a domain of 
integration which is, respectively, an oriented curve, a 
surface, or a solid. The integral is defined as the limit of 
approximating sums, and an approximating sum is 
formed by taking a finely divided polygonal approxima- 
tion to the domain of integration, ‘evaluating’ the 
integrand on each small oriented polygon, and adding. 
The integrand is ‘evaluated’ on a small oriented polygon 
by choosing a point P in the vicinity of the polygon, by 
evaluating the functions A, B, etc. at P to obtain a 
constant form, and by evaluating the constant form on 
the polygon in the usual way (as in Chapter 1). 

At this point two questions arise: How can this defini- 
tion of ‘integral’ be made precise? How can integrals be 
evaluated in specific cases? 

It is difficult to decide which of these questions should 
be considered first. On the one hand, it is hard to com- 
prehend a complicated abstraction such as ‘integral’ 
without concrete numerical examples; but, on the other 
hand, it is hard to understand the numerical evaluation 
of an integral without having a precise definition of what 
the integral is. Yet, to consider both questions at the 
same time would confuse the distinction between the 
definition of integrals (as limits of sums) and the method 
of evaluating integrals (using the Fundamental Theorem 
of Calculus). This confusion is one of the greatest 


2.2 | Integration 


27 


Exercises 


obstacles to understanding calculus and should be 
avoided at all costs. Therefore, all consideration of the 
evaluation of integrals is postponed to Chapter 3, and the 
remainder of this chapter is devoted solely to the question 
of the definition and elementary properties of integrals. 


1 Let Adx + Bdy + Cdz = (1/r°)[x dx + ydy + z dz] 
be the central force field with k = 1 (see Exercise 1, 
§2.1), and let S be the line segment from (1, 0, 0) to (2, 0, 0). 
Find the approximating sum to fs (A dx + Bdy + C dz) 
formed by dividing S into 10 segments of equal length and 
using the midpoint of each interval to evaluate A dx + 
B dy + C dz. [Express the answer as a number times a sum 
of reciprocals of integers; actual numerical evaluation of the 
sum is difficult.] Call this number (10). Similarly, let $` (7) 
be the approximating sum formed by dividing S into n equal 
segments (n = positive integer), and evaluating at midpoints. 
Express ł (n) as a number times a sum of reciprocals of 
integers. Suppose that a computing machine has been pro- 
grammed to compute ` (n) for any n, the value being rounded 
to two decimal places. Find an upper bound for the magnitude 
of the difference |} (10) — }°(20)|. [Each term of 5°(10) 
corresponds to two terms of >/(20). To get an upper bound 
on the magnitude of the difference |(1/x%) — (1/x3)| it 
suffices to observe that the slope of the chord of the graph of 
1/x? from (x1, (1/x%)) to (xa, (1/x3)) (1 < xı < x2) is 
greater (both are negative) than the slope of the tangent at 
(1, 1) which gives 
+- E < (x2 — x1) 


èx 


If $- (10) and ` (20) are both rounded to two decimal places, 
how great can the difference of the resulting numbers be? 
Find an integer N such that |$ (N) — $} (mN)| < .005 for all 
integers m. Show that moreover |} (n) — $} (mn)| < .005 for 
alln > Nand all m. Show that |$ (N) — }-(n)| < .01 for all 
n > N. Conclude that the number produced by the computer 
for any n > N, no matter how large, will differ from the 
number it produces for N only by +1 in the last decimal 
place. This is what it means to say that the number $` (N) 
rounded to two decimal places represents the limiting value 
(the integral) with an accuracy of two decimal places. 


2 Computation with decimals and decimal approximations. 
A number a (the approximate value) is said to represent a 


Chapter2 | Integrals 28 


number ¢ (the true value) with an accuracy of 3 decimal 
places if |a — t| < .001. Show that: 

(a) If a represents ¢ with an accuracy of 3 decimal places, 
if a3 is a rounded to 3 decimal places, and if f3 is £ 
rounded to 3 decimal places, then a3 differs from t3 by 
at most +1 in the last (third) place. 

(b) No matter how close a is to t, a3 may still differ from 
t3 by +1 in the last place. 


3 Form approximating sums to 1 dx in the same way as 
SX 

in Exercise 1; that is, divide the interval, S, into n equal 
segments and evaluate at midpoints. Suppose the computer 
has been programmed to find this number ` (n) rounded to 
three decimal places. Find an N such that $` (N) represents all 
$ (n) for n > N (and hence represents the limiting value) 
with an accuracy of three decimal places. 


4 Let S be the circle {(x, y): x? + y? = 1} oriented 
counterclockwise, and let Æ dx + B dy be the 1-form 


X y 
~~ dy — — gx 
x2 ye 2 + ye 


of Exercise 2, §2.1, giving flow from a source at the origin. 
Estimate fs (A dx + Bdy) by using the inscribed regular 
n-gon to approximate S and evaluating A dx + B dy for each 
segment at the midpoint of the corresponding arc of the circle 
(because this is easiest). Call the result $` (n). Express $ (n) 
explicitly in terms of the number sin (*). [The formula 
n 
sin(x + y) — sin(x — y) = 2cosx sin y and the analogous 
formula for cos(x + y) — cos(x — y) are used.] The formula 


lim [(sin x)/x] = 1 
z—0 
can be used to evaluate the limit as n — œ. 


5 Let x°y2z dx dy be a 2-form describing a flow in space and 
let S be the rectangle {(x,y,z):0< x <1, O< y < 2, 
z = 1} oriented counterclockwise. Let ` (n,m) be the 
approximating sum to fs x°y*z dx dy obtained by dividing 
the x interval 0 < x < 1 into n equal parts, dividing the y 
interval 0 < y < 2 into m equal parts, orienting each of the 
mn rectangles counterclockwise, and evaluating the 2-form at 
the midpoint of each rectangle. Find an N such that >)(N, N) 
represents > (n,m) with four-place accuracy for n > N, 
m >N. 


2.3 | Definition of Certain Simple Integrals 29 


Definition of Certain Simple 
Integrals. Convergence and 
the Cauchy Criterion 


2 


zi 


g 
g 


ggg 


5 


gg 
ug 


guggg 
gggyg 
g 


gaggy 
guggy 


gagad 


sie [2 [so] 
a 
g 


ga 
ug 
gg 


SJS 


gang 


g 
gagdagg 


5 


9) (3) 5|% 


*The notation f? f(x) dx denotes, of 
course, the integral of the 1-form 
F(x) dx over the interval {a < x < b} 
oriented from a to b. Unfortunately 
there is no such convenient notation 
for indicating orientations of 
2-dimensional integrals. 


lines x = 
lines y = 


xi, Where a = 


2.3 


The greatest difficulty in giving a precise formulation of 
the informal definition of §2.2 lies in describing precisely 
what is meant by a ‘finely divided polygonal approxima- 
tion to the domain of integration’. This section is devoted 
to giving a precise formulation of the definition of §2.2 
for integrals in which the domain of integration is a 
simple domain for which a ‘finely divided polygonal 
approximation’ can be described explicitly. 

Consider first the case of an integral fr A dx dy in 
which A dx dy is a (variable) 2-form on the xy-plane and 
in which the domain of integration R 1s a rectangle 


R= {fa<x<bec<y<d} 


in the xy-plane oriented counterclockwise. [If R is 
oriented clockwise then fpg Æ dxdy is defined to be 
— fr A dx dy where — R denotes R with the opposite 
orientation. This is in accord with the usual definition* 
fe fx) dx = — f f(x)dx of integrals over intervals 
{a < x < b} which are oriented from right to left.] In 
this case a ‘finely divided polygonal approximation’ to 
the domain of integration R can be obtained simply by 
drawing lines x = const., y = const. to divide R into 
subrectangles and by orienting each subrectangle counter- 
clockwise in agreement with the orientation of R. An 
approximating sum corresponding to this ‘finely divided 
approximation to R’ is obtained by ‘evaluating’ 
A(x, y) dx dy on each subrectangle and adding over all 
rectangles. To form such an approximating sum it is 
necessary to choose: 


Xo < X1 L Xa St <L Xm—1 L Xm =b 


yj, where c = yo < yı < yo L` <L Yn <n = d 


points P,;, one in each of the mn rectangles R;; 


tThis sum is a number once A, R 

are given and all the choices a are 
made. It is helpful to imagine that a 
computer has been programmed to 
compute this number >> (a) whenever 
it is provided with the choices a. 

See Appendix 7. 


= {(x, y): Xj—1 <x Xis Vj-1 < y < Yj}. 
The approximating sum is then 


$ [A(P;;)][oriented area of R;;] 


(1) 


il 


A(Pi;X(xi — xi—1)Q; — Yj-1). 


=] j=l 


The choices x;, y;, Pi; will be denoted collectively by the 
letter a and the corresponding sum (1) by $} (a). 


Chapter2 | Integrals 30 


*/t is an axiom of the real number 
system that a number is defined once 
such a process for computing it to 
any prescribed degree of accuracy 
has been given (see §9.7). 


To say that the approximating sums }ł_ (æ) approach a 
limit as the approximation is refined means essentially 
that the choices a do not significantly affect the result 
2 (a), provided only that the polygons R;; on which the 
approximating sum }_ (a) is based are all small. Let |a| be 
the largest dimension of any of the rectangles specified 
by a, that is, |a| = max(x, — Xo, X2 — X1, -< -3 Xm — 
Xm—1s V1 — Vos+++s¥n — Yn—ı). la| is called the mesh 
size of a. For the limit to exist means that if the mesh 
size |a| is small the resulting sum }_ (a) is insensitive to 
the choices to a very high degree; that is, another ap- 
proximating sum > (a’) in which the mesh size is simi- 
larly small will differ very little from 50 (a). Specifically, 
convergence of the sums ) (a) is defined by the Cauchy 
Convergence Criterion: 


The integral |p A dx dy is said to converge if it is true 
that given any margin for error e there is a mesh size 6 
such that 


any two approximating sums ł_ (a), } (a^) in 
which the mesh sizes are both less than 6 differ 
by less than the prescribed margin for error e, 


that is, 


ja] < ô, ja”| < ô imply |} (a) — Z (a’)| < e. 


If this is the case then the limiting value can be defined 
to be the number* which is determined to within any 
margin of error in the obvious way. For example, 1f it is 
desired to find the limiting value with an accuracy of five 
decimal places (see Exercise 2, §2.2) sete = .00001, let ô 
be the corresponding mesh size, choose any a with 
la| < ô, form È (a), and round to five decimal places. 
The number which results is determined, except for +1 
in the last place, solely by the 2-form A dx dy and the 
domain of integration R. If a different set of choices a is 
used, no matter how different they may be and, in par- 
ticular, no matter how much finer a subdivision of R 
they may involve, the result will be the same except for 
at most +1 in the last place. The limiting value defined 
in this way is called the integral of A dx dy over R and is 
denoted fr A dx dy. 

If this is not the case, that is, if the integral does not 
converge, then it is said that the integral diverges and 
that the limiting value fr A dx dy does not exist. 


2.3 | Definition of Certain Simple Integrals 31 


*/t is easy to show that convergence 
or divergence of the integral, as well 
as the limiting value in the case of 
convergence, is independent of the 
choice of the enclosing rectangle R 
(see Exercise 6). 


tMeaning ‘reasonable’ curves, that 
is, differentiable curves. The detailed 
statement of the theorem is given in 
Chapter 6. 


this means that there is a number- 


M such that |A(P)| < M at all points 
P of D. 


§This means that given a point (x, Y) 
of D and a margin for error «e there is 
a distance 6 such that the value of A 
varies by less than « on the square 
{Ix — x| < & |y — y| < ô}; that is, 
given (x, y) in D and «e > O there is a 
ô > O such that |A(x, y) — A(x, y)| 
< e whenever |x — x| < & |y — Y | 
< 6. Intuitively it means that if (x, y) 
is near (x, y) then A(x, y) is near 
A(x, y). 


This completes the definition of the integral of a 
2-form A(x, y) dx dy over an oriented rectangle R. The 
integral f r A dx dy either is a number, defined above, or 
it does not exist. Only minor modifications are necessary 
to define the integral of a 3-form over an oriented 
rectangular parallelopiped or the integral of a 1-form 
over an oriented interval. The integral of a 2-form over 
a more general oriented domain D of the plane—for 
example, over the disk D = {(x, y): x? +y? < D 
oriented counterclockwise—can be defined by the simple 
trick of taking a rectangle R containing the domain D, 
setting the integrand equal to zero outside D, and pro- 
ceeding as before. Assuming then that D is an oriented 
domain of the plane which can be enclosed in a rectangle 
R, the integral of a 2-form A dx dy over D is defined by 


setting 
Aplx.y) A“ y) if (x, y) is in D 
x, y) = . . . 
DY O if (x, y)is notin D 


and defining 


| Acs») de dy = | Aon aay, 
D R 


where the orientation of R is chosen to agree with that of 
D.* The integral of a 3-form over any oriented domain in 
space which can be enclosed in a rectangular parallelo- 
piped, and the integral of a l-form over any oriented 
domain of the line which can be enclosed in an interval, 
are defined by the same trick. Such domains are called 
bounded domains (that is, they are domains which stay 
within certain finite bounds). In summary, the integral 
of a k-form over a bounded, oriented domain of k- 
dimensional space has been defined for k = 1, 2, 3. 

However, the definition begs a substantial part of the 
question, namely, does the integral converge? Generally 
speaking, one can say that in all reasonable cases the 
answer is yes. In particular the answer is yes in all cases 
which are considered in the following chapters, namely 
those cases in which the bounded domain D is a region 
enclosed by a finite number of curvest and the function A is 
boundedt and continuous§ on D. 

An outline of the proof of the fact that these conditions 
on A and D guarantee the convergence of the integral is 
given below. This proof involves some rather difficult 
mathematical arguments, and one should not expect to 
understand it completely on the first or second reading. 


Chapter2 | Integrals 32 


*A subdivision S' is said to be a 
refinement of a subdivision S if it is 
obtained from S by adding more 
lines x = const., y = const. 


However, there is no better way to grasp the meaning of 
convergence of integrals than to study the proof of this 
theorem, particularly in concrete cases such as those of 
Exercises 1, 2, 3 at the end of this section, and Exercises 
1, 3, 4, 5 of the preceding section. 


Outline of proof 


Let D be a bounded, oriented domain enclosed by a 
finite number of smooth curves. Let R be a rectangle 
containing D and let A be the function on R which is the 
given continuous function at points of D and which is 0 
at points of R not in D (this function was denoted Ap 
above). The orientation of R, which is taken to agree 
(counterclockwise or clockwise) with that of D, can be 
assumed to be counterclockwise, since this affects only 
the sign of the result. It is to be shown that for such a 
function A on R the sums )>_ (a) defined by (1) converge. 

For the sums > /(a) to converge means that >) (a’) can 
be made to differ arbitrarily little from > (a) by making 
la| and |a’| small. In particular, if a’ differs from a only 
in the choice of the points P;; and not in the choice of 
rectangles—that is, if Rj; = R,;—then 


(a) — N@)| = [E APH) — AP) area (Ris) 


can be made small by making |a| = |a’| small. Letting S 
denote the subdivision of R into subrectangles R;; 
common to a and a’, this implies that the maximum value 
can be made small, that is, 


(2) U(S)= >| max {A(P’) — ACP); |[area (R;;)] 


i,j LP,P' in Rij 


can be made small by making the mesh size | S| of S small. 
(Since the maximum is > 0, all terms are positive and the 
absolute value signs may be dropped.) Specifically, if the 
integral converges, then given any margin for error e 
there is a mesh size ô such that the number U(S) defined 
by (2) is less than e whenever the mesh size of the sub- 
division S is less than ô, i.e. |S] < 6, implies U(S) < e. 
In brief, if the integral converges, then U(S)— 0 as 
|S| — 0. 

The converse of this statement is also true; that is, if 
U(S) — 0 as |S|— 0 then the integral converges. One 
need only note that if a is based on a subdivision S and 
if a’ is based on a subdivision S’ which is a refinement* 
of S then it is still true that |} (&) — $ (a| < U(S). 


2.3 | Definition of Certain Simple Integrals 33 


(Exercise 4.) Then if a, a’ are any two sets of data there 

is a third set of data, œ”, based on a refinement of both S 

and S’ (merely take S’’ to include all lines x = const., 
= const. specified by either S or S’), and hence 


Ll) — Lle) = [L@) E) + Le’) — Le’) 
< [L@) — Le’) + (Le) Ea) 
< U(S) + UCS’). 


If U(S) — Oas |S| — 0 this can be made small by making 
|S| and |S’| both small; that is, given e, there is a mesh 
size such that [S| < 6 and |S’| < 6 implies |} (œ) — 
> (a’)| < e, and therefore the integral converges. Thus 
the integral converges if and only if U(S) — 0 as |S| — 0. 

This important conclusion is perhaps more compre- 
hensible when it is formulated as follows: The number 
U(S) represents the ‘uncertainty’ of an approximating 
sum J} (a) to fp A dx dy based on the subdivision S. 
Any approximating sum based on any refinement of S 
differs from any approximating sum based on S by at 
most U(S). Thus further refinement changes the result by 
at most the ‘uncertainty’ U(S). The integral converges if 
and only if this uncertainty can be made small by making 
the mesh size small. 

The problem, therefore, is to show that the assumptions 
on A and D are sufficient to guarantee that U(S) — 0 as 
|S| — 0. In order to do this, it is useful to decompose the 
sum (2) which defines U(S) into two parts U(S) = 
U,(S) + U.(S) as follows: A term of U(S) correspond- 
ing to a rectangle R,; of S is counted in the first sum 
U,(S) if Ry is contained entirely in D, and counted in 
the second sum U.(S) if R; lies partly in D and partly 
outside D. (If R,; lies entirely outside D then A is 
identically zero on R;; and the corresponding term in 
U(S) is zero.) It will be shown that the numbers U;(S) 
and U.2(S) are both small when |$| is small (but for quite 
different reasons). 

The sum U.(S) is small because the total area of the 
rectangles R,; which lie partly inside and partly outside 
D is small. More specifically, each term is at most 
(A(P’) — A(P))[area (R;;)] < 2M[area (R,;)] where M 
is a bound on the bounded function A, so the total U2(S) 
is at most 2M times the total area of such R,;. It is 
intuitively clear that if the boundary of D consists of a 
finite number of reasonable curves then the total area of 
such rectangles R,;; can be made arbitrarily small by 


Chapter 2 | Integrals 


34 


Exercises 


making the mesh size small. The rigorous proof of this 
statement must await a precise definition of ‘reasonable’ 
curves. (See Chapter 6 for the proof. For specific ex- 
amples this statement can be proved directly—see 
Exercise 1.) 

The sum U,(S) is small because |A(P’) — A(P)| is 
small (when P’ is near P) by the continuity of A. Specifi- 
cally, it can be shown that for every e > O there is a 
ô > 0 such that |A(P) — A(P’)| < € whenever |P — 
P’| < 6 (meaning that both the x-coordinates and the 
y-coordinates of P, P’ differ by less than 6). In other 
words A is uniformly continuous on D (see §9.3.). Then, 
if |S] < ô, it follows that |U,(S)| is at most e times the 
total area of the rectangles R;; inside D. Since e is 
arbitrarily small and the total area of the rectangles in D 
is bounded (by the area of a rectangle containing D, for 
instance) it follows that U,(S) can be made small by 
making |S| small. (Again, the rigorous proof is postponed 
to Chapter 6. In specific examples it can be proved 
directly that U,;(S)— 0 as |S| — 0; see, for example, 
Exercise 1, §2.2.) 

This completes the outline of the proof that the integral 
fp A dx dy converges. 


1 Computation of r. The number r is defined to be the area 
of the circle (disk) of radius 1, that is, 


T = | acay 
D 


where D is the disk {(x, y): x? + y? < 1}. Given an integer 
n, draw a square grid of n? squares and (n + 1)? vertices, 
labeling the vertices with integer coordinates (p, q),0 < p <n, 
0 <q < n. Mark the vertices (p, q) for which p? + q? < n? 
with an X. Shade all squares for which the inner vertex (p,q) 
has an X but for which the outer vertex (p + 1,q + 1) does 
not. Let 


U,, = # of shaded squares (U = uncertain) 
C, = # of squares whose vertices all have X’s (C = certain) 


il 


A, = 4 (i Cr += pop Un) (A = approximation) 


Find A109. Find Ago. Show that any approximating sum to 


2.3 | Definition of Certain Simple Integrals 35 


*(Note added in 1980) G. H. Hardy 
‘Collected Papers, Vol. 2, p. 290) did 
disprove inr? ~N,\<M\/r . He 
conjectured that < Mrl 1/2) +€ was 
correct, and this has never been 
disproved. The statement about 
Gauss conjecturing < Mr is not 
historically reliable. 


the integral fp dx dy which is based on the subdivision by lines 
x= + - „y= * ° or on any refinement of this subdivision 
differs from A, by at most 2n-*U,. What accuracy (how 
many decimal places) does this estimate guarantee for the 
approximation 410 ~ r? What is the actual accuracy? What 
accuracy does it guarantee for Ago? What is the actual 
accuracy ? Find a formula for Un. [Count the crossings of the 
lines x = const. and y = const. separately.] How large would 
n have to be for this estimate to guarantee two-place accuracy ? 
Note that the approximations are in fact more accurate than 
this estimate of the error would indicate. Explain this. 


2 Many mathematicians, notably Karl Friedrich Gauss 
(1777-1855), have investigated the number N, of points 
(+p, +g) with integer coordinates contained in the circle of 
radius r (including points on the circle). 


(a) Find Ns, N7. 
(b) Show that N,/r? is an approximating sum to 7 = 


fp dx dy. [Subdivide the plane by lines x = + > + 
i y= + m + 1 and evaluate at midpoints of 
2r r 2r 

squares. ] 

(c) Use the argument of Exercise 1 to prove that the 
number of squares which lie on the boundary of D is 
< 8r + 4). 

(d) Show that any approximating sum based on any 
refinement of the subdivision of (b) differs from 
N,/r? by less than 8(r + 4)/r?, and hence that the 
limiting value m differs from N,/r? by less than 
8(r + 4)/r?, i.e. 


Imr? — N,| < 8(r + 3). 


As was seen in the preceding problem, this estimate of the 
error is much too large; Gauss conjectured that the error is 
of the order of magnitude of vr, that is, that there is a 
number M such that |rr? — N,| < Mvr, but this conjecture 
has never been proved or disproved. * 


3 Show that the integral {p dx dy defining m converges. 
[This is of course a special case of the theorem proved in the 
text, so it is a question of extracting the necessary parts of that 


proof. Take a fine subdivision by lines x = + P ,y = q , 
n n 
take B1, ..., By to be the squares which lie on the boundary, 


estimate their area, and estimate for any subdivision S the 
total area of the squares which touch one of the B; in terms 
of the mesh size |S]. Show that this total area, and hence 
U(S), can be made small by making |S| small.] 


Chapter2 | Integrals 


36 


4 Show that if È (a) is an approximating sum based on the 
subdivision S and if }\(a’) is any approximating sum based 
on a subdivision S’ which is a refinement of S, that is, for 
which every rectangle Ri of S’ is contained in or equal to 
some rectangle of S, then 


IS @) - Le) < us) 
= >» | max {A(P) — A(P)} | [area R;,] 


rectangles P, P'in Rij 
ijo S 
[Lump together all terms of $- (œ) which correspond to the 
same rectangle R;; of S.] 


5 Show that if an integral converges and if L is the limiting 
value then any approximating sum ` (a) based on the sub- 
division S is within U(S) of the limiting value |L — $} (œ| < 
U(S). [This observation was used in Exercise 2. It suffices to 
show that |L — È (œ| < U(S) + « for all e > 0. Note that 
by definition of L, |L — }\(@’)| can be made less than any 
€ > 0 by making |S’| small where S” is the subdivision of a’. 
Take S’ to be a refinement of S.] 


6 Irrelevance of the enclosing rectangle. Suppose that Ap is 
a bounded function which is zero outside the domain D, and 
suppose that the rectangles R, R’ both contain D. Show that 
fr Ap dx dy = fr: Ap dx dy. [First show that one may as 
well assume that R’ C R and, in fact, that D = R’. Then since 
every approximating sum to fpr is also an approximating sum 
to fr it is easy to show: 


| converges implies | converges; 
R R’ 


if both converge the limits are equal. 


The only statement remaining is 


| converges implies | converges. 
R’ R 


This is proved by showing that the rectangles of a sub- 
division S of R which lie on the boundary of R’ make an 
insignificant contribution to U(S).] 


7 The definition of ‘integral’ given in the text was first given 
by Bernhard Riemann (1826-1866) in whose honor this 
notion of integration is called ‘Riemann integration’ and the 
approximating sums are called ‘Riemann sums’. Riemann 
gave the following necessary and sufficient condition for the 
convergence of the integral fr A dx dy where R is a rectangle 
and A is an arbitrary function on R: 


Riemann’s criterion. For any positive number o and any 
subdivision S of R, let s(S, c) be the total area of those 


2.3 | Definition of Certain Simple Integrals 37 


rectangles of S for which the variation U; > ø, i.e. in which 
there are points P, P’ with A(P) — A(P’) > co. Then the 
integral fr A dx dy converges if and only if 


(1) A is bounded on R and 

(2) for every o > O the total area s(S, øo) of those rec- 
tangles on which the variation is >o can be made small 
by making the mesh size |S| small; that is, given o > 0 
and e > 0 there is a 6 > 0 such that s(S,o) < e€ 
whenever |S| < ô. In short, for every ø > 0, s(S, oc) > 
0 as |S| — 0. 


Prove Riemann’s criterion. [From the argument of the text 
it suffices to show that U(S) — 0 as |S| — 0 if and only if (1) 
and (2) hold. To prove ‘only if’, show that if A is unbounded 
then for any fixed S the maximum U(S) is in fact infinite; 
whereas if (2) is false for some o then U(S) > ao: s(S, o) does 
not go to zero as |S| — 0. To prove ‘if’, split U(S) into two 
sums, one consisting of terms in which U;; < o and the other 
in which U;; > o, make the first sum small by making o 
small, and make the second sum small by using (1) and (2).] 


8 In the text the condition ‘the integral diverges’ is defined 
to mean the negation of the statement ‘the integral converges’. 
State precisely what this condition says about the approxi- 
mating sums $` (a); in other words, reformulate the denial of 
the Cauchy Criterion as a positive statement. Show that 
fr A dx dy diverges when R is a rectangle and when A(x, y) 
is the function 


A(x, y) = 1 if x, yare both rational numbers 
, 0 if one or both coordinates are irrational. 


If L is a number with the property ““Given e > 0 and given 
ô > 0 there exists an approximating sum >| (a) to fr A dx dy 
with |L — $ (a)| < «and |a| < 6” what values can L have? 
[Use the fact that every interval contains both rational and 
irrational numbers to show that L can be any number between 
0 and the oriented area of R.] 


9 Suppose that a, b, c, dare four numbers such that the lines 
ax + by = const. and cx + dy = const. are not parallel. 
Then a bounded domain can also be subdivided using lines 
ax + by = const. and cx + dy = const. Describe how to 
form an approximating sum to fp A dx dy based on such a 
subdivision. Define the mesh size of such a subdivision to be 
largest dimension (in x- or y-direction) of any parallelogram 
of the subdivision. Sketch a proof of the fact that if D is a 
rectangle and if A is continuous then these approximating 
sums converge. The proof that their limit is fp A dx dy is 
examined in Exercise 10. 


Chapter2 | Integrals 38 


Integrals and Pullbacks 


pt tS SO D 


10 Given an arbitrary subdivision of the plane into poly- 
gons, describe how to form an approximating sum to 
fp Adx dy based on the subdivision. Define ‘mesh size’. 
Sketch a proof of the fact that if D is a rectangle and if A is 
continuous then these approximating sums converge. Show 
that their limit is fp A dx dy. Why is it necessary to restrict to 
polygonal subdivisions? The approximating sums of the text 
were based on rectangular subdivisions because they are 
easiest to describe (being defined simply by the numbers 
Xi, yj) and because it is quite difficult to define precisely what 
is meant by ‘an arbitrary subdivision of the plane into poly- 
gons’. 


2.4 


In §2.2 the integral of a k-form over an oriented k- 
dimensional domain was described as a limit of approxi- 
mating sums, an approximating sum being formed by 
constructing a finely divided oriented polygonal domain 
which approximates the domain of integration, ‘evaluat- 
ing’ the k-form on each polygon of this polygonal do- 
main, and adding; the ‘evaluation’ involves first 
evaluating the k-form at some point near the polygon to 
obtain a constant k-form, and then proceeding as in 
Chapter 1. 

In §2.3 this description was made the basis of a rigorous 
definition for cases in which the domain of integration is 
a rectangle (interval, rectangular parallelopiped) by 
describing explicitly what is meant by a ‘finely divided 
polygonal approximation’ to a rectangle. This definition 
was extended to arbitrary bounded domains in k-space 
by the simple trick of enclosing such a domain in a 
k-dimensional rectangle, setting the integrand equal to 
zero outside the domain, and proceeding as before. 

The definition of §2.3 does not apply, however, to 
integrals over curves in the plane, over curves in space, or 
over surfaces in space. Such integrals will be defined in 
this section by assuming that the domain of integration 
can be described parametrically and by defining the 
integral as an integral over the parameter space. 

Let the curve Sin the xy-plane represent a displacement 
of a particle. Then S can be described by a pair of 
functions (x(t), y(t)) giving the coordinates of the 
particle as functions of time, these functions being defined 
for the time interval D = {a < t < b} during which 


2.4 | Integrals and Pullbacks 


39 


the displacement S occurs. A curve represented in such a 
way is called a curve defined by a parameter, as opposed 
to a curve defined by an equation. (For example, the curve 
{(cos t, sint):0 < t < 2r} is a curve defined by a 
parameter, whereas the set {(x, y): x? + y? = l} isa 
curve defined by an equation.) Then a polygonal approxi- 
mation to S can be constructed by giving a subdivision 
a = to < ty < to... < t_1 < tn = b of the param- 
eterizing time interval D and drawing the polygonal 
curve from (x(to), y(to)) to (x(tı), y(tı)) to... to 
(x(ta), Y(t). The corresponding sum approximating 


fs (A dx + B dy) is Y (A Ax: + B Ay) where Ax: = 


x(ti) — xti), Avi = P(t) — Y(t) and where the 
functions A, B are evaluated at some point in the 
neighborhood of the line segment from (x(t;_1), y(ti—1)) 
to (x(t), y(t;))—at the point (x(t), y(f,)) where 7; is a 
point in the ith interval t;_, < f; < ti, for instance. 
Thus fs (A dx + B dy) is approximately 


È EGD, YGA) Ax: + BED, YG) Ayd 


1 n , , 
(1) = [at B |an 


where At; = t; — ti—ı = value of dt on the directed 
interval ¢;_ ,¢,;. But the sum on the right is approximately 
the integral of the 1-form 


AG. vo) & O + BEM, xO) È | dt 


over the oriented interval from a to b. This integral has 
been defined and converges provided that the functions 
A(x, y), B(x, y) are continuous, which is assumed al- 


, , d 
ready, and provided that the functions Z op, A (t) 


exist and are continuous, which will be assumed hence- 
forth. Therefore fs A dx + B dy can be defined to be 


d d 
l |4 “ + B2] dt and equation (1) shows that the 
D\ dt dt 


resulting number corresponds to the intuitive description 
of fs (A dx + B dy) given in §2.2. 

A slightly different way of arriving at the same result 
is to interpret A dx + B dy as representing ‘work’ and to 


Chapter2 | Integrals 


40 


| act B dy) 


ask how much work is done during a short time interval. 
This is approximately 


oe 


Ay; 
[aata yi) F + B(x(#:), y(is)) Az: At; 
and the shorter the time interval the better the approxima- 
tion; that is, 


work done during dx dy 

short time intervals © Ẹ dt + 8 A di 

The work done during a time interval which is not short 
is then found by integration 


work done during the time interval D 


dx dy 
[jase BË |a 


Seen in this light, what is involved is a pullback operation 
in which a l-form on the ¢-line is obtained from the 
1 form A dx + B dy on the xy-plane and a map of the 


dx 
t-line to the xy-plane. The 1-form (a2 — +B 2) dt is 


is called the pullback of the (variable) 1 ‘form under the 
(non-affine) map x = x(t), y = y(t). 
The justification for defining 


[ Aas+ Bay + cay = | at — + B= ATA t, 


when S is a parametric curve {(x(t), y(t), z(t): tin D} 
in which the functions x, y, z have continuous derivatives 
is exactly the same. The 1-form 


a5 “1 po | + cH a 


on the line is called the pullback of the (variable) 1-form 
Adx + Bdy + Cdz under the (non-affine) map 
x = x(t), y = y(t), z = z(t). 

Similarly, let S be a surface which is given in the form 


S = {(x(u, v), y(u, v), z(u, v): (u,v) in D} 


where x(u, v), y(u, v), z(u, v) are given functions of two 
variables u, v defined on a domain D of the uv-plane. 


2.4 | Integrals and Pullbacks 41 


Such a surface is called a surface defined by parameters, 
as opposed to a surface defined by an equation. (For 
example, the surface 


S = {cos 0 cos ș, sin #c0s e, sin ¢): 0 <6 < 2n, -3$ p [L a 


bo! 


is a surface defined by parameters, whereas the set 
{(x, y, Z): x? + y? + z? = 1} is a surface defined by 
an equation.) The integral of a 2-form over such a surface 


| t4ayae + B dz dx + C dx dy] 
S 


S will be defined to be the integral of a pulled-back 
2-form over the parameter space D. This 2-form on D is 
to assign to a small polygon in D the approximate value 
of A dy dz + B dz dx + C dx dy on its image. 

To find such a 2-form on D, consider a particular point 
(a, 5) in D. Defining the partial derivatives, as usual, by 


2X (a, 8) = lim MnP) — XP) 
ðu uou u — ü 
OX n, D) = lim x(u, v) — x(a, 0) 
15) vo v— BD 


it is to be expected that the approximation 
_ = ðx _ _ OX, _ - 
x(u, v) ~ x(n, 0) + an (a, D)\(u — a) + Jo (a, 0)(v — D) 


holds for (u,v) near (u, 0); that is, the given function 
x(u, v) is well approximated by the affine function on the 
right. The given functions y(u,v) and z(u,v) can be 
approximated in the same way, leading to the conclusion 
that the map x = x(u,v), y = y(u,v), z = z(u, v) de- 
fining the surface is well approximated near (%, 0) by the 


affine map 
x = x(@,B) + * @,d)u — a) + = GD) — D) 
(2) y= G0) + $Z (u, Du — n) + 2 @ Dw — D) 
z = z(0,5) + = (a, B\(u — a) + A (a, B)\(v — D). 


The image of a small polygon in D under the actual map 
is nearly its image under the affine map (2), and the 


Chapter2 | Integrals 


42 


value of A dy dz + B dz dx + Cdxdy on its image is 
therefore nearly the value of the pullback of 


(3) A(X, 9, 2) dy dz + B(x, 7, 2) dz dx + C(X, 7, Z) dx dy 


on the polygon itself (where X = x(ū, 0), etc.). It is 
reasonable to define the pullback of A dy dz + Bdzdx + 
C dx dy under the map x = x(u,v), y = y(u, V), z = 
z(u, v) to be the 2-form in uv whose value at any point 
(u, 0) is the pullback of the constant form (3) under the 
affine map (2). 

This (variable) 2-form in wv is easily computed. One 
need only set 


Ox Ox 

dx = a, U + 3, dv 
_ oy oy 

dy = ay U + 5 dv 
OZ OZ 

dz = ay du + a. dv 


= x(u, v) 


= y(u, v) 
= z(u,v), 


N Se 
| 


substitute these expressions in A(x, y, Z)dy dz + 
B(x, y, z) dz dx + C(x, y, z) dx dy, and simplify using 
the usual rules du du = 0, du dv = — dv du. For example, 
the pullback of x dy dz — y? dx dy under the map 


x= e“ +v 

y = u+ 2v 

Z = COS u 
1S 


(e + v)(du + 2 dv)(—sin u du) — (u + 2v)*(e" du + dv)(du + 2 dv) 


= [(e” + v)2 sin u + (u + 2v)*(1 — 2e%)] du dv. 


Note that the pullbacks of 1-forms above were found 
by the same method: The pullback of A(x, y) dx + 
B(x, y)dy was found merely by performing the sub- 

ae dx dy 
stitutions x = x(t), y = y(t), dx = J dt, dy = P? dt 
to ‘express A dx + B dy in terms of ?’. 

Thus the pullback of a (variable) k-form under a (not 


2.4 | Integrals and Pullbacks 43 


(4) 


Exercises 


necessarily affine) map can be defined by these com- 
putational rules 


dx = * du + do, dudu = 0, dudv = —dv du, etc., 
Ou OV 


and it is reasonable to interpret the resulting k-form in 
uv as the k-form whose value on a small k-dimensional 
figure in uv-space is equal to the value of the given 
k-form on the image of this figure under the given map. 

Using the definition of pullback by the computational 
rules (4), one can define the integral of a k-form over a 
k-dimensional domain which is defined by parameters 
to be the integral of the pullback over the parameterizing 
domain D. [Assuming that the partial derivatives of the 
given map exist and are continuous, the pullback is a 
continuous k-form on a k-dimensional space; hence the 
integral is defined and converges whenever the param- 
eterizing domain D is reasonable in the sense of §2.3.] 

To prove that this definition of integrals over curves 
and surfaces has all the desired properties is a rather long 
task (see Chapter 6). What is important for the moment 
is the computation of pullbacks and an intuitive under- 
standing of the relation of this operation to the notion of 
integration as described in §2.2. 


1 Find the pullbacks of 


(a) x dy dz under x = cos uv, y = sin uv, z = uv? 
(b) xy dz dx under x = ucosv, y =u-+ov,z = usinu 
(c) z3 dx dy under x = e" + v, y = eX —v,z=2 


2 Ifxdy + y dx gives the work required for small displace- 
ments in the plane, and if (x, y) = (Vt, À for t > 0 gives 
position as a function of time, find rate of work as a function 
of time (=work done/time elapsed as elapsed time — 0). 
Graph the motion by drawing the curve along which the 
particle moves and labeling points with the time at which 
they are passed. Write the amount of work done between 
times t = 1 and ¢ = 4 as an integral. 


3 The mapping 
x = 2u/(u? + v2 +21) 
y = 2v/(u? + v? + 1) 
z = (u? + v? — 1)/Uu? + v? + 1) 


Chapter2 | Integrals 44 


Independence of Parameter 


NEw 


arises from the stereographic projection of the sphere x2 + 
y? + z? = 1 onto the plane z = 0. Specifically, the point 
(x, y, z) given by these formulas is the unique point of the 
sphere which lies on the line through (u, v, 0) and (0, 0, 1). 
Check this fact and show that every point of the sphere except 
(0, 0, 1) corresponds to exactly one point of the plane. Find 
the pullback under this map of the 2-form x dy dz + 
y dz dx + z dx dy. [The computation is long but the answer is 
simple. ] 


4 Find the pullback of x dy dz + y dz dx + z dx dy under 
the map 


x = cos@cosy, y = sin@cosg, z = sing 
giving spherical coordinates on x? + y? + z? = 1. 


5 Write the integral of Exercise 4, §2.2, as an integral over 
{0 < 6 < 2r}. 


6 Find the pullback of dx dy under the map 
x = rcosé, y = rsiné 


giving polar coordinates on the xy-plane. What is the approxi- 
mate area of a ring-shaped region {r} < x? + y? < r3}? 
What are the orientations when r < 0? 


2.5 


When the domain of integration is described parametri- 
cally the integral can be defined, as in the previous 
section, to be the integral of the pullback of the integrand 
over the parameterizing domain. However, it is fre- 
quently necessary—as will be seen in Chapter 3—to deal 
with integrals over oriented domains which are not 
described in this way, to deal, for example, with integrals 
of 2-forms over the sphere S = {x? + y? + 2z? = 1}. 
If the sphere is oriented by the convention ‘counter- 
clockwise as seen from the outside’ (a mathematical 
description of this convention is given below) then the 
intuitive description of the integral given in §2.2 is 
applicable but the exact definition of §2.4 is not. The 
solution of this problem is simply to parameterize the 
domain of integration, but this raises some very difficult 
questions: 


(i) When can a domain be parameterized? 

(ii) What, precisely, does it mean to say that a para- 
metric domain is a parameterization of a given 
domain? 


2.5 | Independence of Parameter 


45 


(iii) If a domain is parameterized in two different 
ways, is the integral the same? 


Rigorous answers to these questions will not be given 
until Chapter 6. In general, the first two questions are not 
important in practice, and the answer to the third 
question, which is very important in practice, is ‘yes’ 
under very broad assumptions. 

For example, consider some parameterizations of the 
sphere {x? + y? + z? = 1}. A very common one is 
given by spherical coordinates 


F COS 0 COS Q 


r sin 0 cos ¢ 
z=rsing 


on the rectangle r = 1, 0 < 6 < 2r, -;Ses 


IN 


Denoting this rectangle by D and the sphere by S, a 
mapping from D to S has been given which can be seen 
geometrically to cover all of S; in fact, the lines 0 = const. 
are the meridians of longitude, » = const. are the 
parallels of latitude, and these coordinates are the usual 
ones for locating points on the earth’s surface. 

To integrate a 2-form over this parametric surface 
it 1s necessary to orient the parameterizing domain 


D= r=10<0<%,-T< es and this is to 


be done in such a way that it gives the orientation 
‘counterclockwise as seen from the outside’ to the 
sphere. A few pictures will suffice to show that the correct 
orientation of D is ‘counterclockwise’ when 0, ¢ are 
drawn as shown, but the same result can be reached 
without the aid of pictures as follows: An orientation of 
the 6y-plane can be described by specifying either dé dy 
or dy dô and saying that a triangle is positively oriented 
if the value of the specified 2-form on it is positive; thus 
dð dp +> counterclockwise and dy d0 <> clockwise. Simi- 
larly, an orientation of the sphere can be described by 
specifying a non-zero 2-form and by saying that a 
small triangle on the sphere is positively oriented 
if the value of the specified 2-form is positive. In this 
particular example the 2-form xdydz+ ydzdx + 
z dx dy describes the given orientation of the sphere; it 
is dx dy at (0, 0, 1), dy dz at (1, 0, 0), — dx dy at (0, 0, — 1), 
dz dx at (0, 1, 0), etc.—all of which can be seen to de- 
scribe the orientation ‘counterclockwise as seen from the 


Chapter2 | Integrals 


46 


Sx 


outside’ near the points in question. The pullback of 
x dy dz + y dz dx + z dx dy to 6¢ therefore will show 
how the parameter space should be oriented; this pull- 
back is of the form f (0, ¢) dé dy where f (0, ¢) is a function 
whose values in the rectangle 


T T 
D=|0<9<2,-%<9<4 


are all positive (see Exercise 4, §2.4). Thus the orientation 
of the parameterized sphere by d0 dy agrees with the 
orientation of the sphere by x dy dz + y dz dx + z dx dy 
which is the orientation described verbally by the phrase 
‘counterclockwise as seen from outside’. 

A second method of parameterizing the sphere is to 
take x, y as coordinates, that is, to project onto the 
xy-plane. Each point inside the disk 


D = {(x,y)ix* +y” < 1} 


corresponds to two points on the sphere, one on the 
upper hemisphere and one on the lower hemisphere 
(except for points on the equator) leading to the param- 
eterization of S by two parametric surfaces 


St = {(x,y, V1 — x? — y2): x? + y? < I} 
ST = {(x, y, —V1 — x? — y?): x’ + y” < I}. 


Since the orientation of S is described by dx dy at 
(0,0, 1) and by dy dx at (0,0, —1) these parametric 
surfaces should be oriented by orienting the domain 
D = {(x, y): x? + y? < 1} using dx dy for S* and 
dy dx for S~. This method of parameterizing S has the 
disadvantage that the parameterizing mappings do not 
have continuous partial derivatives at the equator so 
that the pullbacks of a 2-form on S are not defined at 
points of the boundary of D. The parameterization can 
nonetheless be used to find the integral fs (A dy dz + 
B dz dx + C dx dy) of a 2-form over S by integrating 
over 


SÈ = {(x, y, VI — x? — y3): x? +y S1- 6 
oriented dx dy and 
Se = {(x, y, —V1 — x? — y?): x? + y? < | — e} 


oriented dy dx, adding the results, and letting € — 0. 


2.5 | Independence of Parameter 


Lot {2 
GG 


47 


A third parameterization of the sphere is given by the 
stereographic projection 


(in ee Se : all | 
u? + p2 + | u? + v2 + | u2+p2+ ] . ali U, D 


(see Exercise 3, §2.4). The reader will easily find the 
orientation of the uv-plane which corresponds to the 
given orientation of S (Exercise 5). This parameterization 
has the disadvantage that the point (0, 0, 1) is omitted. 
It can nonetheless be used to find the integral of a 2-form 
over S by integrating the pullback over u? + v? < A 
and letting Æ — œ. Alternatively, the stereographic 
projection can be used to parameterize the lower hemi- 
sphere {u? + v? < I} and its mirror image used to 
parameterize the upper hemisphere. 

These various parametric representations lead 
to various methods of computing ‘the number 
fs (A dy dz + B dz dx + C dx dy)’. The fact that all of 
them result in the same value requires proof because no 
definition of ‘the number fs(A dydz + B dzdx + 
C dx dy) has been given other than these methods of 
computing it. This fact, that integrals over domains can 
be computed using any parameterization of the domain, 
is called the principle of independence of parameter. 

Another example of the principle of independence of 
parameter is contained in the rule for conversion to polar 
coordinates in a double integral. For example, if D is the 
disk D = {(x, y): x? + y? < 1} oriented counterclock- 
wise, then D is parameterized in polar coordinates 


r cos @ 


x 
y 


rsin @ 


by the rectangle 0 < r < 1,0 < 0 < 2r oriented coun- 
terclockwise. Therefore the integral of A(x, y) dx dy over 
the disk is equal to the integral of the pullback 


A(r cos 8, r sin @)(cos 0 dr — r sin 6 dé)(sin 0 dr + r cos 6 dé) = A(r cos 9, r sin 6)r dr dé 


over the rectangle. The orientation dx dy corresponds to 
the orientation dr d0 because r > 0. 

Similarly, the integral of a l-form over an oriented 
curve is independent of the choice of parameter. For 
example, if S is the curve x? = y between the points 


Chapter2 | Integrals 


48 


Exercises 


(1, 1) and (2, 4) oriented from the first toward the second, 
then the integral of A dx + B dy over Scan be computed 
using either the parameterization S = {(x, x°): 1 < 
x <2} or the parameterization S = {(/y, y): 1 < 
y < 4} oriented by dx and dy respectively. 

Plausible as these statements are in specific cases, it is 
rather difficult to give a precise definition of the state- 
ment that two parameterized surfaces are parameteriza- 
tions of the same surface, and more difficult still to prove 
rigorously that integrals are independent of parameter. 
Until these subjects are dealt with carefully (in Chapter 6) 
the notion of the integral of a form over an oriented 
surface is not defined until a particular parameterization 
of the oriented surface is given. However, this is strictly 
a technical difficulty; integrals are independent of 
parameter, and the informal description of integrals in 
§2.2 is nearer to their true meaning than is the precise 
definition of integrals over parameterized surfaces given 
in the preceding section. 


1 Parameterize the surface x + y+ z= 1. Orient the 
parameterization so as to agree with the orientation given on 
the original surface by dy dz; by dz dx; by dx dy. Sketch the 
given surface showing its orientation. 


2 Find the pullbacks of dy dz, dz dx, and dx dy under each 
of the parameterizations of the sphere considered—spherical 
coordinates, projection on the xy-plane, and stereographic 
projection. 


3 Let {? f(x) dx be a given integral. Let x = x(u) be a 
‘parameterization’ of the interval {a < x < b} by a new 
parameter u on an interval {a < u < 8}. That is, let x(u) be 
a differentiable function establishing a one-to-one corre- 
spondence between points of the interval {a < u < 6} and 
points of the interval {a < x < b}. State the principle of 
independence of parameter in this case. Pay particular atten- 
tion to the orientation. Apply this to the integral f è x" dx 
(a > 0, b > 0) when x = e. 


4 In order to find the correct orientation of spherical co- 
ordinates Oy it is not necessary to compute the pullback of 
x dy dz + y dz dx + z dx dy at all points, since the pullback 
at any one point is sufficient to determine the sign. The point 
(x,y,z) = (1, 0, 0), (@, ~) = (0, 0) is particularly simple. Find 
the pullback at this point. 


2.6 | Summary. Basic Properties of Integrals 49 


Summary. 
Basic Properties of 
Integrals 


*The precise definition of 
‘k-dimensional domain’, which is 
given in Chapters 5 and 6, depends 
essentially on the Implicit Function 
Theorem. 


5 Find the correct orientation of the parameterization by 
stereographic coordinates. [Find the pullback of x dy dz + 
y dz dx + z dx dy at the point (0, 0, —1).] 


6 Parameterize the three pieces of the boundary of the 
cylinder {(x, y, z): x? + y? < 1, —1 < z < 1}. Orient each 
of the three pieces by the rule ‘counterclockwise as seen from 
the outside’. 


7 Parameterize the surface obtained by rotating the circle 
(x — 2}? + z? = 1, y = 0 about the z-axis. Orient the param- 
eter space to agree with the orientation of the given surface 
by the rule ‘counterclockwise as seen from the outside’. 
[First parameterize the given circle and rotate about the 
z-axis using cylindrical coordinates (r, 6,z). Convert to 
(x, y, z) coordinates using x = rcos@, y = rsin, z = z.] 
This surface is called a torus. 


2.6 


Chapter 1 was devoted to constant k-forms and to their 
evaluation on simple k-dimensional figures such as 
oriented line segments, triangles, parallelograms, and 
cubes. This chapter has been devoted to (variable) k- 
forms and to their evaluation on oriented k-dimensional 
domains. The ‘value’ of a k-form on an oriented k- 
dimensional domain has been defined as a limit of sums, 
that 1s, as an integral. 

The precise definition of the notion of ‘integral’ is 
difficult for two reasons—first, because it involves the 
notion of ‘limit’, and second, because it involves the 
notion of ‘oriented k-dimensional domain’. The notion 
of ‘limit’ is defined by the Cauchy Convergence Criterion, 
which was discussed in detail in §2.3. The problem of 
defining ‘oriented k-dimensional domain’ is more difficult 
and has been avoided entirely by restricting consideration 
to specific domains such as rectangles, disks, spheres, 
etc., and to domains which are parameterized by such 
domains.* 

The following properties of integrals are all immediate 
consequences of the definition of integrals as limits of 
sums. They are stated specifically for integrals of the 
form Sr A(x, y) dx dy where R is an oriented rectangle 
in the xy-plane and A(x, y) is a continuous function 
defined at all points of R, but they all have analogs which 


Chapter2 | Integrals 


50 


are true for integrals of k-forms on k-dimensional 
domains in the general cases described in §2.2: 


(1) If the orientation of the domain of integration is 
reversed the integral changes sign: f _pAdxdy = 
— fr A dx dy. 

(ii) If the domain of integration is divided into two 
(or more) smaller rectangles oriented in accord- 
ance with the orientation of the original, then the 
integral over the whole is the sum of the integrals 
over the parts: 


| Adxdy = | Ads dy | A dx dy. 
Rı+E2 Ry Ro 


(iii) If the integrand is multiplied by a constant 
the integral is multiplied by the same constant: 
Sr cA(x, y)dx dy = c fr A(x, y )dx dy. 

(iv) If the integrand is a sum of two (or more) terms 
then the integral of the sum is the sum of the 
integrals: fr (A; + Ao) dx dy = fr A, dx dy + 
fr Ao dx dy. 


If the rectangle R is subdivided into a very large num- 
ber of very small pieces, then, by (ii), f r A dx dy is the 
sum of the integrals over the individual pieces. The 
integral over a very small rectangle is roughly the value 
of A on the rectangle times the oriented area of the 
rectangle. This is only roughly true because, of course, 
A does not have a value on the rectangle but many values. 
However, the assumption that A is continuous is the 
assumption that A is nearly constant on sufficiently 
small rectangles, so that the integral over such a rec- 
tangle is nearly ‘the’ value of A times the oriented area. 
Specifically, the definition of continuity of a function 
easily implies the following: 


(v) Given a point (x,y) and given e > 0 there is a 
5 > 0 such that if R is any rectangle containing 
(x, F) and contained in the square {|x — X| < ô, 
ly — F| < 6}, then fpAdxdy differs from 
A(X, J) times the oriented area of R by less than 


e times the area of R 
| dx a. 
R 


A useful way to remember this statement is by means 


|f sacay — 4,9) | axa <E: 
R R 


2.6 | Summary. Basic Properties of Integrals 51 


Exercises 


of the formula 


Adxd 
lim Jr Cee 


Z = A(P 
rop fr dx dy (P) 


where the rectangle R is thought of as shrinking down to 
the point P. This statement about integrals over small 
rectangles, together with the subdivision property (ii), is 
the substance of the intuitive idea of ‘integral’ as it is 
described in §2.2. 

Another type of formula which is frequently useful is 
the formula for a double integral as an iterated integral 


dpb 
(1) | AG») de dy = | | ACx, y) dx dy 


where A(x, y) is a continuous function on the rectangle 
R= {ta<x<b,c<y<d', where R is oriented 
counterclockwise, and where f? A(x, y) dx is considered 
as a function of y. For the proof of this formula see 
Exercise 2. 

The integral of a form over a domain which is described 
parametrically is defined in terms of the pullback of the 
form under the parameterizing map. The pullback opera- 
tion, which is a simple generalization of the pullback 
operation for constant forms under affine maps, is 
studied further in Chapter 5. 


1 Prove the properties (i)-(v) of fr A dx dy. 


2 Prove that if A(x, y) is continuous on R = {fa < x < b, 
c < y < d} then the integral on the right side of (1) converges 
and the formula (1) holds. [Use the fact, stated at the end of 
§2.3, that for every e > Othereisa ô > Osuch that |A(x, y) — 
A(X, y)| < e whenever (x, y), (¥,y) are points of R such 
that |x — x| < 6, |y — y| < 6. (This is Theorem 2 of §9.4.) 
Then if ` (œ) is an approximating sum to the integral on the 
right side of (1) based on a subdivision of {c < y < d} 
finer than ô, it follows that ` (œ) differs by less than 
e(b — a)(d — c) from an approximating sum to fr A dx dy, 
which in turn differs by less than e(6 — a)(d — c) from 
fr A dx dy. Thus } (œ) — fr A dx dy as was to be shown.] 


Integration and 
differentiation 


chapter 3 


3.1 


The Fundamental Theorem The evaluation of integrals in elementary calculus is 
of Calculus accomplished by the Fundamental Theorem of Calculus, 
which can be stated as follows: 


I. Let F(t) be a function for which the derivative F’(t) 
exists and is a continuous function for ¢ in the interval 
fa < t < b}. Then 


b 
(1) | F'(t) dt = F(b) — F(a). 


II. Let f(t) be a continuous function on a < t < b. 
Then there exists a differentiable function F(t) on 
a < t < b such that f(t) = F’(¢). 


Part I says that in order to evaluate a given integral it 
suffices to write the integrand as a derivative so that the 
desired integral is on the left side of equation (1) and a 
known number is on the right. For example, to compute 
the integral i (1/t?) dt of Exercise 1, §2.1, it suffices to 
write the integrand as 


52 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhauser Classics, 
DOI 10.1007/978-0-8176-8412-9_3, © Harold M. Edwards 2014 


3.1 | The Fundamental Theorem of Calculus 53 


so that (1) says 


Part II says that theoretically this procedure always 
works, that is, theoretically any continuous integrand 
can be written as a derivative. Anyone who has been 
confronted with an integrand such as 


tf 
sin? ź 


f(t) = 


without the aid of a table of integrals, or 


l 

Í = mmama 
ID= aa 
with or without a table of integrals, knows how deceptive 
this statement is. In point of fact, IT says little more than 
that the (definite) integral of a continuous function over an 
interval converges, which was already proved in §2.3 
(see the proof of II below). 

In the use of part I to evaluate integrals the right side 
is assumed known and the left side is thereby evaluated; 
the equation (1) can also be useful when read the other 
way around, i.e. when knowledge of the left side is used 
to draw conclusions about the right. For example, it says 
that if F’(t) is identically zero then F(b) — F(a) = 0; 
that is, a function whose derivative is zero is constant. 
More generally, one value F(a) of the function and all 
values F’(t) of its derivative suffice to determine all other 
values 


b 
F(b) = | F'(t) dt + F(a) 


of the function. 


Proof of | 


The idea of the theorem is the following: Let a = tọ < 
ty < to < ++: < t, = b be a subdivision of the given 
interval into small subintervals. Then 


Chapter 3 | Integration and Differentiation 54 


F(b) ~ F(a) = [F®) — Ftp) + Fat) — Flea) to 
+ [F(t2) — F(t] + [F() — F(a) 


= ) AF = DE at 


where >_ denotes a sum over all subintervals of the sub- 
division, and where for each subinterval {t;,_, < t < #,} 
the symbol AF denotes F(t) — F(t;_1) and At denotes 
ti — t;~1. By the definition of ‘derivative’, the numbers 


AF 
yy are nearly F’(t) for ¢ in the interval; hence, by the 
wae , AF , 
definition of ‘integral’, the sum >. — At is nearly 
b AF 
J F'(t) dt. Since the sum >> ay ot is exactly equal to 


F(b) — F(a), this 1s the statement to be proved. 
To make this rough argument into a proof of I, one 
must estimate the error in the approximations 


hy 
| F'(t) dt ~ F'(t) At ~ a5 At = AF. 
t 


i~] 
In doing this it is helpful to divide the difference between 
f i F '(t) dt and AF by At and to estimate 


ty-1 


ti 
I 
(2) AG [| Poa -aF ; 
which can be thought of as the average difference per unit 
length between the numbers Ji, F'(t)dt and AF = 
F(t) — F(t;—1). Assuming that the theorem is true, this 
average difference per unit length is of course zero for all 
subintervals {t;—ı < £ < t;} and this is the statement to 
be proved. If the interval is further subdivided, then the 
maximum of this average, like any average, can only in- 
crease; that is, the average on at least one of the smaller 
intervals is as large as the average over the whole interval. 
since the limit of (2) as At + 0 is F’(t) — F’(t) = 0 this 
observation will suffice to prove the theorem. 

Specifically, for any r, s in the intervala <r < s <b 
let €, denote 


| 


s—- r 


Ers = {| F'(t) dt — [F(s) — Fol . 


If c is the midpoint between a and b then 


3.1 


| The Fundamental Theorem of Calculus 55 


1 


l 
b—a 
c—a 


b 
— | | F'() dt — [F() — Fa 


c b 
— | | F(t) dt + | F'(t) dt — [F(b) — F(c)] — [F(c) — F(@)] 


b-c 
=Z Sac + b—a Seb 


b—a 


= (Eac + Ecb). 


*The proof of this ‘obvious’ fact is 
very subtle. See §9.4 and 
Appendix 4. 


Thus either |&,.| > lEabl or |El > l&abl; that is, the 
average error is at least as great on (at least) one of the 
two halves as it is on the whole interval. Dividing this 
half into halves and repeating the argument shows that 
there is a quarter of the original interval on which |&| is 
at least |&q,|. Continuing this process ad infinitum gives a 
sequence of intervals such that the ith interval is one of 
the halves of the (i — l)st (the first interval is 
fa < t < b}) and such that the average error &; per 
unit length on the ith interval satisfies |8;| > |&;_,|. As 
i — oo the intervals shrink down to a point,* say T, and 
&; approaches 


„1 , AF yey pry 
in grod- Jim, SP = FED ~ PE) = 
by (v) of §2.6 and by the definition of the derivative 
F'(T). Thus |&;| > |6 | for all i and lim &; = 0, which 


410 


implies 8a = 0. This completes the proof of I. 


Proof of II 


Given a continuous function f(t) on a < t < b, the 
integral defines a function 


(3) F(c) = | f(t) dt 


assigning numbers (the integral) to points c in the interval 
a<cc<_b. It is to be shown that the function F so 
defined is differentiable and that its derivative is f. But 
since 


F0) = Fl) [ 
ti — Io ty — to St, f) dt, 


this follows immediately from (v) of §2.6. 


Chapter 3 | Integration and Differentiation 56 


Exercises 


Statement IT is confusing to many students because of 
a misunderstanding about the word ‘function’. When one 
thinks of a function one unconsciously imagines a simple 
rule such as F(t) = t? or F(t) = sin vt which can be 
evaluated by simple computation, by consultation of a 
table, or, at worst, by a manageable machine computa- 
tion. The function defined by (3) need not be a standard 
function at all, and a priori there is no reason to believe 
that it can be evaluated by any means other than by 
forming approximating sums and estimating the error as 
in the preceding chapter, in which case the function F on 
the right-hand side of (1) is just as difficult to evaluate as 
the integral on the left. The method “write the integrand 
as a derivative” is better stated “‘write the integrand as 
the derivative of a function whose evaluation is easier 
than the evaluation of the given integral by direct com- 
putation.”’ Only when this is possible does the equation 
(1) give a means of evaluating the integral. 


Interpretations of the Fundamental Theorem: 


1 Give physical descriptions of the function F and the 1- 
form F'(t) dt in such a way that equation (1) gives two ways 
of finding the amount of work required to go from a to b. 
Compare to Exercise 1, §2.2. 


2 Give a physical description of the equation (1) in which 
F(t) is the flow across points of a flow on the line as in 
Exercise 4, §2.1. What is F(b) — F(a)? Give an interpretation 
of F(b) — F(a) on small intervals (~F’(t) dt) in terms of the 
mass of the fluid. Describe equation (1) in terms of this 
model. 


3 Let F(t) be position as a function of time. Give physical 
descriptions of the function F’(r), the 1-form F’(¢) dt, and the 
equation (1). 


4 Let f(t) be a given function and let Ajay; be the area 
under the curve, that is, the area between the graph (t, fŒ) 
of the function and the interval [a, b] of the t-axis, counting 
area as negative if the curve lies below the t-axis. Fix to and 
consider Ato») = F(b) as a function of b. Give a geometrical 
interpretation of the 1-form F'(t) dt, and hence of the function 
F'(t). Describe each side of (1) as an expression for Aja,bj. 
(This is an extremely awkward interpretation of the Funda- 
mental Theorem. See Exercise 12, §3.2.) 


3.1 


| The Fundamental Theorem of Calculus 57 


Analogy of the principle “to integrate, write the integrand as a 
derivative” and the principle “to sum, write the summand as a 
difference.” 


In order to sum a finite series f(1) + f(2) + fG) + 
-++ + f(n) in which the summand f(n) is a function of 
n, it suffices to find a function F(n) whose differences 
F(n) — F(n — 1) are equal to f(n) since then f(1) + 
fO +: +f) = (Fd) — FO] + ED — FA + 
-++ + [F(n) — F(n — 1)] = F(n) — F(O). Given an f(n) it 
is usually quite hard (much harder than finding antiderivatives) 
to find a function F(n) such that f(n) = F(n) — F(n — 1). 
However, one can cheat and start with F, find its differences 
F(n) — F(n — 1), and see what series f(n) it enables one to 
sum. This is the method of the following exercises: 


5 Set F(n) = cos(nA + B) where A, B are fixed numbers. 
Use the formula for cos(a + 8) — cos(a — 8) to simplify 
the difference F(n) — F(n — 1). Write the result in the form 
C sin(nA + D) and find the formula for sin 6 + sin(a + 8) + 
sin(2a + B) + +- + sin(na + 8). 


6 Find the formula for cos 8 + cos(a + 8) + cos 
(2a + B)+ +++ + cos(na + 8). 


7 Set F(n) = r” and show that this yields the sum of the 
series 1 + r + r? pee rrol forr #1. 


8 Setting F(n) = n? gives the sum of f(n) where f(n) = 
2n + 1. Since the sum of 1 + 1 ++- + 1 isn this gives the 
familiar formula for 1 + 24+ 3+:-::+ a2. 


9 Set F(n) = n? and use the formula of Exercise 8 to find 
a formula for 1 + 22 + 32 +--+ +n’. 


10 Use the method of 9 to find the sum 1 + 22 + 33 + 
cee + n°, 


11 Use the same method to show that the sum 1 + 2* + 
k+1 


. . n 
3k + -+ ně can be written in the form i 
where the omitted terms are multiples of n*, n®-!,..., n°. 


12 Let f(t) be a function on the interval fa < t < b}. Then 
by definition the integral fe f(t) dt is equal to the limit of the 


1 . . 
sums 5 Í (2 N where the sum is over all integers j such 


that a < j/N < b, that is, such that Na < j < Nb, and 
where the limit is taken as N — œ. The formulas of Exercises 


5-11 give explicit formulas for $` f (4) 5 when f(t) = 


sin t, f(t) = cost, f(t) = rt, f(t) = t”. Use these formulas 


Chapter3 | Integration and Differentiation 58 


The Fundamental Theorem 
in Two Dimensions 


*/n the remainder of this chapter the 
‘Fundamental Theorem’ will mean 
just Part I of the Theorem. 


Na, 


F(Q) —F(P) =} AF ~faF 


to write f? f(t) dt as a limit as N —> © in these cases. Evaluate 
the limit; the formulas 


lim —— = |] li —- = Jo 
N—-o a N—>oo 1 
N N 
are needed. 


13 If the function F(t) is monotone on {a < t < b}, ie. if 
y = F(t) establishes a one-to-one correspondence between 
the interval {a < t < b} and an interval of the y-line, then 
part I of the Fundamental Theorem is a special case of the 
principle of independence of parameter (§2.5). Describe the 
relationship between the two. 


3.2 


The generalization of the Fundamental Theorem* to two 
dimensions can have two very different forms, depending 
on the way that the one-dimensional theorem 
f? F'(t) dt = F(b) — F(a) is interpreted. On the one 
hand there is the generalization “the difference F(Q) — 
F(P) between two values of a function F(x, y) on the 
xy-plane can be written as the integral of a 1-form over a 
curve from P to Q” and on the other hand there is the 
generalization ‘‘the integral of a 1-form A dx + B dy 
around the boundary of a 2-dimensional region can be 
written as the integral of a 2-form over the region 
itself.” 

In the first generalization, a function F(x, y) is given 
and it is claimed that if S is an oriented curve from P to 
Q in the xy-plane then F(Q) — F(P) can be written as 
an integral over S, fs (A dx + B dy), of some 1-form 
A dx + B dy derived from F. Intuitively, the idea is that 
the curve S can be well approximated by a polygonal 
curve PoPiP,...Pn, where Pp = P and P, = Q, in 
which the line segments P;_,P; are very short. Then 


F(Q) — FP) = [F(Pa) — F(Pn—1)] + [F (Pr) — F(Pn—2)] + ++ + EEP) — FoI 


= > AF, 


where the sum is over the line segments P;_,P; making 
up the polygonal curve and where AF = F(P,) — 
F(P;_,). One should be able, for simple functions F, to 


3.2. | The Fundamental Theorem in Two Dimensions 59 


*The notation ðS for ‘the boundary 
of S’ is standard and will be used 
throughout the book. It is not to be 
confused with the use of the symbol 
ð to denote partial differentiation. 
Note that an orientation of S gives an 
orientation to ðS as well. 


(x+h,y) 
[Fe + hy) — Fey) = | | (A dx + Bay) = | 
- (L,Y x 


A> 


pł > 
F(Q) —F(P)=cAF 


because of cancellation 
at interior points 


because of cancellation 
on interior boundaries 


find a 1-form dF = A dx + B dy whose values on short 
line segments are nearly AF so that the above sum is 
nearly an approximating sum of fs dF. Assuming that 
there is such a 1-form dF = A dx + B dy, it is easy to 
see that A must be 


oF _ h, y) — F(x, 


sth 
A(x, y) dx ~ A(x, y): h 


OF OF OF 
similarly, B = —, hence dF must be — dx + — dy. 
oy Ox oy 


Thus the expected theorem would be 


(1) F(Q) — F(P) = J. (= ax + Fay) 


when S is an oriented curve from P to Q. 

In the second generalization, a 1-form A dx + B dy 
is given and it is claimed that if S is an oriented 2- 
dimensional domain in the xy-plane with boundary aS 
oriented accordingly,* then fos (A dx + Bdy) can be 
written as an integral over S, Í, s C dx dy, of some 2-form 
C dx dy derived from A dx + B dy. Intuitively, the idea 
is that the domain S can be divided up into a large num- 
ber of small polygons, say S1, So,..., Sn. Then 


| (Ade + Bay = > | (A dx + Bdy) 
ðS i=1 Y ôS; 


because the interior boundaries cancel, that is, any line 
segment in the subdivision of S into S1, S2,..., Sp 1S 
counted twice with opposite orientations in the sum on 
the right and hence cancels out in the same way that 
F(P;) — F(P;) cancelled out of the sum in the previous 
case. The idea, then, is that there should be a 2-form 
C dx dy such that 


| (Adx + Bay)~ | C dx dy 
OS; Si 


holds for small polygons S; so that the right side above is 
nearly fs C dx dy. Assuming that there is such a 2-form 
C dx dy, it is easy to see what it must be; taking S; to be 


Chapter3 | Integration and Differentiation 60 


a rectangle {a < x < b,c < y < d} oriented counter- 
clockwise, 


b d a c 
[ act aay - | ACs, dx | B09) dy + | ACs, dds | B(a, y) dy 


d b 
= | [B(b, y) — Bla, y)] dy — J [A(x, d) — A(x, c)] dx 


d ° OB b Ey 
-JI Boyada- | [f 94 (x, ») dy dx 


(by the Fundamental Theorem) 


OB ðA 


a 


(by formula (1) of §2.6), and hence 


ðB OA 
| art Bana | (E-A aay 


Since this holds for all rectangles, the desired 2-form must 


ðB 
be C dx dy = (= —? ) ax dy. Thus the expected 


dA 
Ox oy 
theorem would be 


0B ðA 


There is no need to memorize this formula because it 
is easily derived from the usual rules for computing with 


differentials: 
ðA ðA ðB ðB 
d(A dx + B dy) = dA dx + dB dy = (34 dx + oy iy) dx + (2 dx + ay dy) dy 
ðB ðA 
CROLL 


ðA ðA 
(because dA = Jx dx + oy dy and because dx dx = 0, 
Y 
dx dy = —dy dx, dy dy = 0). The 2-form 
ðB ðA 
(2 - A dx dy 


is called the derived 2-form of the 1-form A dx + B dy, 
written d(A dx + B dy), so that the formula (2) becomes 


3.2 | The Fundamental Theorem in Two Dimensions 61 


*F xcept that physicists use the 
opposite sign and speak of the work 
done by the field rather than the 
work done by the mover of the 
particle. See §8.8. 


(2’) | act Bay = | dAd + aay 


The formulas (1), (2) can be illustrated by physical 
ideas. In physics a potential function is a function with 
the property that F(Q) — F(P) is equal to the amount of 
work required to go from the point P to the point Q.* 
Formula (1) says simply that if F is a potential function 
then the force field is described by the 1-form dF in the 


. . OF 
manner described in §2.1. [For example, ax (x, Y) is the 


work required per unit displacement in the x-direction 
near (X, Y); this is minus the x-component of the force.] 
In the formula (2) it is useful to regard the given 1-form 
as describing a flow; in this case, however, it is natural to 
write the 1-form as A dy — B dx so that A is the x- 
component of the flow and B the y-component (see 
Exercise 5, §1.1). Then the formula (2) becomes 


0A . OB 
| ad- say = | (244%) aay 


ðA OB 
The 2-form (2 + >) dx dy is also called the diver- 
y 


gence of the flow represented by A dy — B dx because it 
gives the rate at which the fluid is flowing out of small 
rectangles. 

Formulas (1) and (2) require proof, of course, and 
conditions must be placed on S, F, A, B, etc., in order to 
ensure that they are true. As for Chapters 1 and 2, 
rigorous proofs will be postponed to Chapter 6 because 
of the technical difficulties which they present. However, 
some discussion of the proofs is useful at this point both 
because it sheds light on the meaning of the theorems and 
because it gives an idea of the sorts of difficulties involved 
in the proofs to come. 

The statement of formula (1) assumes that the partial 


oo OF OF . . . 
derivatives ax’ av exist and are continuous functions 
x oy 


of (x, y), and assumes as well that the curve S can be 
parameterized, S = {(x(t), y(t)): a< t < b}, by func- 
tions x(t), y(t) which have continuous derivatives. These 
assumptions are required in order for the definition of the 


OF OF 
number | (= dx + — ay) to apply. This number is 
s \Ox oy 


Chapter 3 | Integration and Differentiation 62 


F(Q) — F(P) = 


Exercises 


b/OF 
then defined to be | (= dx + oF A dt. The num- 
a \Ox dt oy dt 


ber on the left side of (1) is 


b 


F(x(6), »(6)) — F(x(a), »(@) = | E (FCO, O) at 


by the Fundamental Theorem. Equation (1) therefore 
follows from the Chain Rule of Differentiation with which 
the reader may already be familiar: 


d _ OF dx | OF dy, 
(3) g PCO VO} = a at ay att 


This formula of differential calculus, which is proved in 
§5.3, therefore implies the theorem (1). 

Theorem (2) was actually proved above in the case 
where S is a rectangle. This case is particularly simple 
because there is a natural way to parameterize the 
boundary of a rectangle—namely by the coordinate 
functions—so that the number fəs (A dx + B dy) can 
be written in a very explicit form. The essential difficulty 
in proving (2) for ‘arbitrary’ domains S is simply that the 
number fəs (A dx + B dy) has not been satisfactorily 
defined (see §2.5). When it is defined in Chapter 6 it will 
be defined in such a way that the general formula (2) is 
reduced to the case where S is a rectangle, which was 
proved above. 


1 Sketch the following flows. Indicate with a ‘+’ those 
regions of the plane where the flow is diverging, and with a 
‘—’ those where it is converging. Compute the divergence in 
each case: 


(a) dx (b) y dx (c) xdx 
(d) x? dy (e) xdx + ydy (Ð xdy — y dx. 


2 What would the divergence of flow from a source at the 
origin (see Exercise 4, §2.2) be expected to be? Check by 
computation. Since the 1-form describing this flow is not 
defined at the origin, neither is its divergence; therefore the 
flow across the boundary of a domain containing the origin 
cannot be described as the integral of the divergence over the 
domain. What is the rate of flow across the boundary of an 


3.2 | The Fundamental Theorem in Two Dimensions 63 


oriented domain which contains the origin? Which does not 
contain the origin? 


3 Show how the function F(x, y) = 1 (x? + y3 12 
y 


can be used in finding the amount of work required for dis- 
placements in a central force field (see Exercise 1, §2.2). 


4 Use the Fundamental Theorem to prove the formula (1) 
in cases where S is a horizontal line segment parameterized by 
x or a vertical line segment parameterized by y. 


5 Ifa flow is described by the 1-form x dx + y dy find the 
rate of flow across 


(a) the line segment from (0, 0) to (xo, yo); 
(b) the broken line segment from (0,0) to (xo, 0) to 
(xo, Yo); and 
(c) the parabolic arc (t°xo, tyo, 0 < t< 1, 
by direct computation. Give a physical explanation of the 
result. 


6 Show that if a flow is described by a (continuous) 1-form 
of the type < dx + A dy then the flow across the boundary 
x y 


of any oriented rectangle is zero. Use formula (2) to conclude 


that if F is a function such that oO (F , and o (9E 
Ox \ðy Oy \Ox 


both exist and are continuous then 


This theorem is referred to as the equality of the mixed 
partials. 


7 Givena1-form A dx + B dy, the preceding exercise shows 


that if Adx + Bdy = = dx + Sr dy for some function F, 
x y 


dA ð 
then — = =. Prove that if A, B are defined at all points of 
Oy Ox 3B 


the plane and satisfy ay = ox then, conversely, A dx + 
Y X 


B dy = dF for some function F. [Following the proof of part 
II of the fundamental theorem, define F(x, y) to be the 
integral of A dx + B dy from (0, 0) to (x, 0) to (x, y) and 
F2(x, y) to be the integral from (0, 0) to (0, y) to (x, y). Show 
that Fı = Fə and that the function they define has the right 
partial derivatives. ] 


B 

8 A 1-form A dx + B dy is called closed if 9A = OB . It is 
Oy Ox 
, , , OF 

called exact if there is a function F such that — = A, 
OF Ox 


Jy = B. Exercise 6 shows that every exact form is closed, 


while Exercise 7 shows that every closed form is exact pro- 


Chapter 3 | Integration and Differentiation 64 


vided it and its derived form are defined everywhere. Deter- 
mine which of the following forms are closed, which are exact, 
and find functions F for those which are exact. 


(a) (x + yy dx + x+y) dy 
(b) xdy + y dx 
(c) xdx + y dy 


(d) (ye™” cos y) dx + (xe™ cos y — e™” sin y) dy 


© doga + ae + (Z) a 


xdx + ydy 
x2 + y? 
ydx — xdy 
x2 + y2 
9 A force field Adx + Bdy is called conservative if the 


amount of work required for displacement around any closed 
path is zero. Show that if A dx + B dy is conservative, then 


(f) 


(g) 


A B , 
9A = OB . Is the converse true? Show that 4 dx + B dy is 
oy Ox 
conservative if and only if there is a function F such that 
OF 0 i . 
an A, E = B, that is, if and only if the force field can be 

X y 


described by a potential function F. 
10 ‘Express in polar coordinates’ the 1-forms 


xdx + ydy xdy — ydx | 
x2 y2 Hy’ 
that is, find the pullbacks under the map x = rcos ð, 
y = r sin 0. Which of these forms are closed and which are 
exact? Which of the pullbacks are closed and which are exact? 
What would you expect the ‘expression in polar coordinates’ 
of flow from a source (Exercise 2) to be? Verify the answer. 


11 Let Adx + Bdy be a 1-form defined on the disk 
D = {(x, y): x? + y? < 1} oriented counterclockwise, and 
assume that oA , ôA , ôB ; 0B are all defined and continuous 
Ox Oy Ox Oy 
on D. Convert fap (A dx + Bdy) to an integral over the 
boundary of the rectangle {0 < @ < 27,0 < r < 1} in polar 
coordinates. Use the formula (2) to convert this to the integral 
of a 2-form in r and @ over the entire rectangle. Simplify by 


. _ ðA. ðA ð 
using the chain rule to write — in terms of — and — , etc. 
Or Ox Oy 


dx, dy, x dy, x dy — y dx, 


D \Ox 


| ade Bay = | (E-a 
aD p\Ox Oy 


Compare the result to | (z — =) dx dy. Conclude that 
y 


3.3 | The Fundamental Theorem in Three Dimensions 65 


The Fundamental Theorem 
in Three Dimensions 


is valid when polar coordinates are used to define these 
integrals. 


12 Let S be a curve in the plane parameterized by the 
coordinate x, S = {(x, f(x)):a < x < b} where f is a 
differentiable function. Use d(y dx) = dy dx = —dx dy to 
interpret fs y dx as an oriented area. Express this integral in 
terms of x. (This is the interpretation of the integral as ‘the 
area under a curve’. The widespread idea that an integral ‘is’ 
the area under a curve is very unfortunate because it com- 
pletely obscures the meaning of the Fundamental Theorem. 
An integral ‘is’ the limit of a sum, and area ‘is’ a double 
integral.) 


13 Approximating the domain S by an n-sided polygon and 
passing to the limit as n — œ, show that the formula of 
Exercise 5(b), §1.3, becomes 


[ aay = | 3(x dy — y dx) 
s as 


which is a special case of (2). Apply this formula to the case 
where S is the unit disk by using polar coordinates (Exercise 
11). 


14 Prove the formula 


f du 
oltu ” 


by applying the formula of Exercise 13 to the circle param- 
eterized by stereographic projection (see Exercise 3, §2.4). 


3.3 


The extension of the ideas of §3.2 to three dimensions is 
immediate. If F(x, y, z) is a function on xyz-space and if 
S is an oriented curve from P to Q then S can be ap- 
proximated by a polygonal curve consisting of short line 
segments and F(Q) — F(P) can be written as >, AF by 
cancellation on interior boundaries, the boundaries in 
this case being points. Passing to the limit, the formula 
F(Q) — F(P) = > AF becomes 


(1) F(Q) — F(P) = | a 


where dF is a 1-form derived from F. If A dx + Bdy + 
C dz is a l-form on xyz-space and if S is an oriented 
surface with boundary curve 0S then S can be approxi- 
mated by a polygonal surface consisting of small polygons 


Chapter3 | Integration and Differentiation 66 


S; and Jas (A dx + Bdy + C dz) can be written as 
2 fos, (A dx + B dy + C dz) by cancellation on the 
interior boundaries, the boundaries in this case being 
small curves. Passing to the limit in this formula, it 
becomes 


(2) | (Ad+ Bay + cas) = | dads + Bay + Cae 
S 


where d(A dx + B dy + C dz) is a 2-form derived from 
A dx + B dy + C dz. Finally, if A dy dz + B dz dx + 
C dx dy is a 2-form on xyz-space and if S is an oriented 
solid with boundary surface 0S, then S can be divided 
into a large number of small polyhedra S; and 
fos (A dy dz + B dz dx + C dx dy) can be rewritten as 
> Jas, (A dy dz + B dz dx + Cdxdy) by cancellation 
on the interior boundaries, the boundaries in this case 
being small surfaces. Passing to the limit in this formula 
it becomes 


(3) | (Add + Bddr+ Cavdy) = | (Ady de + Bde dx + Cardy) 
S 


where d(A dy dz + B dz dx + C dx dy) is a 3-form de- 
rived from A dy dz + B dz dx + C dx dy. 
In the formula (1) the derived 1-form dF is of course 


OF OF OF 
dF = — dx + — dy + — dz. The proof of the formula 
Ox Oy OZ 


for parameterized curves S reduces immediately, using 
the Fundamental Theorem, to the Chain Rule of Dif- 
ferentiation 


dF oF dx _ aF dy , oF dz 
dt ox dt ody dt dz dt 


where x = x(t), y = y(t), z = z(t) and where F is a 
function of £ by composition F(x(t), y(t), z(t)). The 
Chain Rule, and hence (1), is proved in Chapter 5. 

The formula (3) presupposes that ðS is an oriented 
surface, i.e. that a convention has been established for 
orienting the bounding surface of an oriented solid. 
Geometrically, the usual convention for doing this is 
given by the rule “the boundary of a right-handed solid 
is oriented to be counterclockwise as seen from the out- 
side and the boundary of a left-handed solid is oriented 
in the opposite way.” Analytically, an orientation of a 
surface is a rule for deciding whether a triple of nearby 
points PoP P2 on the surface which are not collinear are 
‘positively’ or ‘negatively’ oriented. The convention above 


3.3 | The Fundamental Theorem in Three Dimensions 67 


states that PoP,P. is positive if the orientation of 
PoP4P Po agrees with the orientation of S when P; is a 
nearby point outside of S. 

An alternative statement of this convention for 
orienting the boundary of an oriented solid is the follow- 
ing: If S is a rectangular parallelepiped {a < x < b, 
c<y<sdge<z<f} oriented by dxdydz (right- 
handed), then the side x = b of S is oriented by dy dz 
(counterclockwise as seen from x > b) and the remaining 
sides are oriented accordingly. Stated in this way the 
convention is seen as a generalization of the rule for two 
y dimensions: If Sis a rectangle {a < x < b,c < y < d} 

oriented by dx dy then the side x = b of Sis oriented by 
dy. (The orientation dy dz of a plane x = const. is 
{4 established by a triple of points (x, y, Z), (X, 7 + 1, Z), 
(X, Y, Z + 1) in that order, the orientation dy of a line 
x x = const., z = const. by (X,j,Z), (œ, + 1,72), the 
orientation dx dy dz of space by (X, 7, Z), (Œ + 1, 9, Z), 

(x,y + 1,2), (% 9,7 + 1), etc.) 

Using this convention, the integral of A dy dz over the 
boundary of a right-handed rectangular parallelepiped 
fa<x<bc<y<de<z<f} is the integral of 
the 2-form [4(b, y, z) — A(a, y, z)] dy dz over the rec- 
tangle {c < y<d,e<z< f} oriented dy dz. Writing 


> ðA 
A(b, y, z) — A(a, y, z) = I =< (x, y, Z) dx by the Fun- 


damental Theorem and applying the 3-dimensional 
analog of formula (1) of §2.6 gives then 


ðA 
| Ada = | Bawa 


(This formula is the real reason for the orientation con- 
vention above; see Exercise 2, §3.4.) Similarly 


| pazdx = | 2B ayazax, | Cardy = | asa ay 
as s Oy as sg OZ 


and, adding these, 


| (A dyde + Bézdx + Cava) = | (244 28 4%) deay ae 
as S 


This shows that the derived form in (3) must be defined 
by 


ðA . OB ðC 
(4) d(Adydz+ B dz dx + C dx dy) = (24 + ay + °c) dx dy dz. 


Chapter3 | Integration and Differentiation 68 


Then formula (3) is proved for the case where S is a 
rectangular parallelepiped. To prove (3) for more general 
oriented solids S the essential difficulty is to define the 
left-hand side (integration over the boundary). When 
this is done (in Chapter 6) the formula (3) for ‘arbitrary’ 
solids S$ will follow from the case of rectangular paral- 
lelepipeds, which has just been proved. Until then the 
formula (3) should be accepted as true on the basis of the 
intuitive argument by which it was derived. 

To find the derived 2-form d(A dx + B dy + C dz) 
which should appear on the right side of (2) it suffices to 
consider cases in which S is a rectangle in a coordinate 
direction — S = fa<x<b,c<y<d,z = const.} 
oriented dx dy, for instance. In this case the formula (2) 
of §3.2 applies and gives 


| (A dx + Bdy + Cdz) = i (A dx + Bdy) = | (2 - 34) dx dy 
as as s \Ox dy 
hence the dx dy-component of d(A dx + B dy + C dz) 
ðB OA 
must be (= — — 
Ox oy 
ment in the zx- and yz-directions gives the other com- 
ponents and hence the formula 


) ax dy. Applying the same argu- 


oC ðB dA ðC 0B ðA 
5 = (< L X L n 
(5) d(A dx + B dy + C dz) & E) ay az + (24 C) ded + (2 A) ac dy 
Note that this formula, like formula (4), need not be 
memorized because it can be derived by the usual rules. 
For example, taking B and C to be zero for the sake of 
simplicity, one obtains 


d(A dx) = dA dx 


ðA ðA ðA 


ðA ðA 
= Oz dz dx — oy dx dy. 


Thus in all three formulas (1), (2), (3) the derived forms 
on the right can be said to be ‘found by the usual rules’. 

Formula (2) is the most difficult of the three to prove, 
because in this case neither side of the equation has been 
satisfactorily defined. Taking S to be a surface with a 
specific parametric description, S = {(x(u, v), y(u, v), 
z(u, v)): (u,v) in D}, the integral on the right side of (2) 
becomes the integral of a 2-form in w over D, while the 


3.3 | The Fundamental Theorem in Three Dimensions 69 


integral on the left becomes the integral of a 1-form in 
uv over the bounding curve ðD. If these two integrals are 
always equal, then it must be true that the 2-form is the 
derived form of the 1-form; that is, the derived form of 
the pullback must be equal to the pullback of the derived 
form. If this is shown to be true, then, just as the formula 
(1) was reduced to the Fundamental Theorem, the 
formula (2) will be reduced to the formula (2) of §3.2 
(when the S of (2) is parameterized). Taking B = C = 0 
for the sake of simplicity, the desired formula is that d of 
the pullback of A dx is the pullback of d(A dx). This can 
be proved by writing 


d(A dx) = dA dx 
ðA 0A ðA 
7 (34 ax + oy dy +s, dz) dx 
0A [Ox Ox 
= É (È du + a, t -|as 


_ |(94 ox , 3A dy , ðA ðz a. 
-|(% ðu dy ðu az 22) du + ( do| dx 


Using the Chain Rule 


6) aA ax, aA ay | OA Oz _ A, 


dx ðu Oy ðu Oz ðu Ou 


ðA 
and the analogous formula for Em this becomes 
V 


ðA ðA Ox Ox 


0A ðx OA OX 
= (%4 Jo dp az) du dv. 


On the other hand, the pullback of A dx is 
Ox Ox 
Alu, v) (2 du + av do) 
and the derived form of this is 


ô Ox ð ðX 


which is the above plus 


07x a7x 


Chapter3 | Integration and Differentiation 70 


Exercises 


But if the function x(u, v) is assumed to have continuous 
second partial derivatives then by Exercise 6 of §3.2 it 
follows that d(dx) = 0; that is, 


and d of the pullback of A dx is identical to the pullback 
of d(A dx). 

This is the method by which the formula (2) will be 
proved in Chapter 6. Using the Chain Rule (6) (proved 
in Chapter 5) it follows that the pullback of the derived 
form is the derived form of the pullback (when the map 
is twice continuously differentiable) so that the desired 
formula is reduced to a previous case (formula (2) of 
§3.2). This in turn can be reduced to the case where S 1s 
a rectangle, which has already been proved above. 


1 Find the derived 3-form of the 2-form 


x dy dz + y dz dx + zdx dy | 
(x2 + y? + z2)8/2 


Interpret the result in terms of Exercise 3, §2.1. 


2 The formula (3) implies that the volume of a solid S is 
equal to 


| x dy dz + y dz dx + z dx dy 
ðS 


when OS is appropriately oriented. Use this formula to find: 


(a) the volume of the unit sphere using spherical co- 
ordinates. 

(b) the volume of the unit sphere using the coordinates 
of stereographic projection. (Exercise 3, §2.4. Evaluate 
the resulting double integral by converting to polar 
coordinates.) 

(c) the volume of the torus of Exercise 7, §2.5. 


3 Given a 3-dimensional region D in xyz-space, imagine 
that ðD is a rigid body (say a metal shell) and that D is filled 
with a gas under pressure. Then the x-component of the force 
exerted by the gas on any piece of 0D is proportional to the 
oriented area of its projection on the yz-plane. Show that the 
total x-component of the force on ðD is zero. 


3.3 | The Fundamental Theorem in Three Dimensions T1 


*See §8.7 


4 Redo the computation of Exercise 11, §3.2 using the 
method of the text to prove that the pullback of the derived 
form is the derived form of the pullback. 


5 Faraday’s Law of Induction says that if the electric force 
field is given by the 1-form FE; dx + Eo dy + Es dz and if the 
magnetic flux is given by the 2-form Hı dy dz + Ho dz dx + 
Hs dx dy where Ei, Eo,..., H3 are functions of x, y, Z, t, 
then there is a constant k such that 


— H=k E 
dt Js as 


for any surface S. Use (2) to state this as an equation involving 
the partial derivatives of E and H. 


6 Any 2-form on space can be pictured in terms of ‘lines of 
force’ such that the integral of the 2-form over any surface is 
equal to the number of lines which cross the surface.* How 
would flow from a unit source at the origin (Exercise 1) be 
represented in this way? What is the description of the 
derived form of the 2-form in terms of lines of force and their 
endings? How is the fact that dH = 0 reflected in the picture 
of the lines of magnetic force? The electric displacement is 
represented by a 2-form EF; dy dz + Eə dz dx + E; dx dy 
whose derived form is equal to the charge density (a 3-form). 
Describe this in terms of lines of force. 


7 As in Exercise 8 of the preceding section, a 1-form on 
space is called exact if it can be written in the form 


OF OF OF 
for some function F(x, y, z). A l-form A dx + B dy + Cdz 
on space is called closed if the derived form 


ôC OB ðA dC 0B OA 
(2¢ — 22) dy dz + (24 — c) dz dx + (2 — 2a) dx dy 


is zero. Prove that an exact 1-form is always closed 


(a) by computation using the equality of mixed partials, 
and 

(b) by arguing directly from the geometrical meaning of 
the derived form. 


8 A 2-form on space is called exact if it is the derived form 


ðC ðB ðA ðC OB 2a 
(x 2) dy dz + & - ac) dz dx + (2 5p) OY 


of some 1-form A dx + Bdy + C dz. It is called closed if its 


Chapter 3 | Integration and Differentiation 72 


Summary. 
Stokes’ Theorem 


derived form is zero. Prove that an exact 2-form is always 
closed 


(a) by computation using the equality of mixed partials, 
and 

(b) by arguing directly from the meaning of the derived 
form. 


9 Give an example of a 1-form in three variables which is 
closed but not exact. Of a 2-form which is closed but not 
exact. Show that a closed 1-form which is defined at all points 
of space is exact. [Use the method of Exercise 7, §3.2.] 


10 Given a closed 2-form on space define a 1-form on space 
by saying that the value on a short line segment PQ is the 
integral of the given 2-form over the triangle OPQ (O = 
origin). Argue geometrically that this will prove that a closed 
2-form defined on all of space is exact. Use this method to 
show that the 2-form dy dz is exact. 


3.4 


All the versions of the Fundamental Theorem stated 
above can be summarized by this statement: The integral 
of a k-form over the boundary of a (k + 1)-dimensional 
domain is equal to the integral over the domain itself of 
the derived (k + 1)-form found by the rules previously 
described. (These are reviewed below.) Intuitively the 
idea is that the given (k + 1)-dimensional domain S can 
be decomposed into a large number of very small 
(k + 1)-dimensional pieces and the integral over the 
boundary can therefore (by cancellation on the interior 
boundaries) be written as the sum of the integrals over the 
boundaries of the pieces. This sum, which is a sum over 
all pieces of a very fine subdivision of S, is the sort of 
sum whose limits define integrals over S. Passing to the 
limit, the integral over ðS becomes an integral over S. 


Terminology and notation 


It is traditional to denote a k-form by the single Greek 
letter w (omega). Then the Fundamental Theorem is 
simply 


(1) | o= | de 
as s 


which is called Stokes’ Formula or Stokes’ Theorem. The 
Stokes’ Formula (1) includes as a special case the 


3.44 | Summary. Stokes’ Theorem 


73 


Fundamental Theorem of Calculus ((1) of §3.1) if a 
function F(t) is regarded as a ‘0-form’ and if the ‘integral’ 
of a 0-form over the boundary of the oriented interval 
from a to b is defined to be F(b) — F(a). 

Stokes’ Formula (1) is called by many different names. 
When k = 0 it is called the Fundamental Theorem of 
Calculus. The case (2) of §3.2 is called Green’s Theorem 
in the Plane. The case (3) of §3.3 is called the Divergence 
Theorem or Gauss’ Theorem. The case (2) of §3.3 is 
called Stokes’ Theorem. As was mentioned in §3.3, the 
formula is most difficult to prove in the case of Stokes’ 
Theorem (i.e. (2) of §3.3), which is the reason that the 
general formula (1) takes its name from this case. 

A k-form is also called a differential form, or an 
exterior differential form to distinguish these forms from 
quadratic forms, bilinear forms, homogeneous forms, 
and other sorts of forms which occur in the mathe- 
matical vocabulary. The derived (k + 1)-form dw of a 
k-form w is also called the differential of w (particularly 
in the case k = Q) or the exterior derivative of w. The 
simplest terminology is just to call it dw (read ‘dee- 
omega’) and this is the terminology which will be used in 
the remainder of this book. 

The rules by which dw is defined are the rules 


0A 0A ðA 


(where A = A(x, y, z) is a function or ‘0-form’° in three 
variables) 


d(A dx + ::©) = dAdx+::: 
d(A dy dz + :::) = dAdydz+::: 
dx dx = 0 

dx dy = —dy dx, etc. 


A k-form w is said to be differentiable if the coefficient 
functions A, B, etc. have continuous first derivatives. In 
the statement of Stokes’ Formula (1) above it is tacitly 
assumed that the k-form w is differentiable. 

The rigorous proof of Stokes’ Theorem (1) must await 
a rigorous definition of the integrals involved. If S is a 
(k + 1)-dimensional rectangle lying in a coordinate 
direction, then S and all 2k pieces of 0S are explicitly 
parameterized by the coordinate functions, so the 
integrals have been rigorously defined; in these cases 
Stokes’ Formula (1) has been rigorously proved above. 


Chapter 3 | Integration and Differentiation 74 


Exercises 


Using the Chain Rule of Differentiation it is not difficult 
(Exercises 7, 8, 9, 10) to prove the formula for simple 
(k + 1)-dimensional polygons S (line segments, triangles, 
tetrahedra), which implies that the formula holds for 
polygonal curves, polygonal surfaces, and polyhedra S 
(when k = 0, 1, 2 respectively). Given an ‘arbitrary’ S 
and given a finely divided polygonal approximation $ of 
S (so that ð$ is a finely divided polygonal approximation 
of aS) this implies fag w = fg dw and hence as $ — S 
it implies fag w = fsdw provided that the intuitive 
definitions of these integrals in terms of limits of polyg- 
onal approximations (§2.2) are valid. 


1 Find dw for the following forms w. [Note: It is not 
necessary to indicate the domain of definition of a form in 
order to differentiate it—e.g., d(x dy) = dx dy whether x dy 
is thought of as a form on the xy-plane or on xyz-space.] 


(a) xy dz + yzdx + zx dy 

(b) x dy dz + y dz dx + z dx dy 

(c) ery2 

(d) (cos x) dy + (sin x) dz 

(e) (x + y)? dy + (x + y}? dz 

(f) log x (g) sin x 
(h) x? (i) x 


2 If R is an oriented line segment then fər x and fe dx are 
equal by definition of dx, and in particular have the same sign. 
Similarly, the sign of dx dy can be defined by the equation 
far xdy = frdxdy; draw the coordinate axes in both 
possible ways and show that this definition agrees with the 
one previously given. Once dy dz has been defined then the 
sign of dx dy dz is determined by the equation far x dy dz = 
fr dx dy dz. Describe the orientation of a rectangular paral- 
lelepiped in the coordinate directions which is positive with 
respect to dx dy dz by describing the positive orientation of 
one of its faces. 


3 Show that d(dw) = 0 corresponds geometrically to the 
statement that a boundary has no boundary. Prove analyti- 
cally that d(dw) = 0. [It suffices to show that d(dA) = 0 for 
functions (0-forms) A, which was proved in Exercise 6, §3.2.] 


4 The function 
2 


2xy Ž (x, y) ¥ (0, 0) 


x 
x2 + y? 
0 (x, y) = (0, 0) 


F(x, y) = 


3.4 | Summary. Stokes’ Theorem 


A reasonable closed curve 
in the plane is the boundary 
of an oriented domain. 


75 


is continuous at (0,0) because the quotient of two homo- 
geneous polynomials [in this case 2xy(x? — y*) and x? + y?] 
in which the denominator is never zero has the limit zero at 
the origin whenever the degree of the numerator [in this case, 
4] is greater than that of the denominator [in this case, 2]. Find 


OF OF 
Dx ; oy and show that they are continuous functions. Show 
X y 


that ô (oF and a OF both exist at the origin but that 
0x \ dy Oy \Ox 


they are not equal. Why does this not violate ‘equality of 
mixed partials’? 


5 Let D be a (reasonable) domain in the plane. If any two 
points P, Q in D can be joined by curve S, what conclusion 
can be drawn about a function F, defined throughout D, for 
which dF = 0? Give an example to show that the conclusion 
does not necessarily hold without the assumption that points 
can be joined by curves. 


6 Let D bea (reasonable) domain in the plane. If D has the 
property that every (reasonable) closed curve in D is a 
boundary—that is, given an oriented curve there is an oriented 
domain S such that the given curve is 0S—what conclusion 
can be drawn about a 1-form w defined throughout D for 
which dw = 0? Give an example to show that the conclusion 
does not necessarily hold without the assumption on D. 


7 Prove that if S is the oriented triangle with vertices (0, 0), 
(1, 0), (0, 1) in that order, and, if A(x, y) is continuous, then 
fs A dx dy can be written as an iterated integral 


1 1—z 1 1—y 
| | | A(x, y) | dx or | | | A(x, y) ax| dy. 
0 0 0 0 


[Use the method of Exercise 2, §2.6.] 


8 Prove that Stokes’ Formula fas = fs dw holds for w 
a differentiable 1-form on the oriented triangle of Exercise 7. 


9 Prove that Stokes’ Formula fasw = fs dw holds when- 
ever S is an oriented triangle in xyz-space parameterized, by 
an affine map, on the triangle (0,0), (1,0), (0,1) in the 
uvu-plane (and whenever w is a differentiable 1-form on S). 
[Use the Chain Rule.] 


16 Outline a proof that {33 w = fs dw holds for polygonal 
surfaces S and differentiable 1-forms w. 


linear algebra 


chapter 4 


4.1 


Introduction Linear algebra is the study of a very simple but very 
important type of problem, of which the following is 
typical: Certain quantities are known to satisfy relations 


p=28+4T+ 3U— V+2 
(1) ga Storr 304 T= 
r= 4S + 8T + 10U + 2V 4+ 2. 


Given values of (p,q,r), find all possible values of 
(S, 7, U, V). 

The method of solution is nothing more than the 
step-by-step elimination of variables. Although certain 
combinations such as p — q or r— 4q immediately 
suggest themselves for this particular case, it is better to 
organize the elimination in a systematic way which will 
be applicable to all such problems. The organizing 
principle which will be used is to move the unknown 
quantities one by one to the left-hand side, thus expressing 
them in terms of the others. In the system (1) one can 
solve the first equation for V 


V = —p+28+47+3U4+2 


and use the result to eliminate V from the other two 
equations 


—p+3S+ 6T+ 6U-1 
—2p + 8S + 167 + 16U + 6. 


76 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhauser Classics, 
DOI 10.1007/978-0-8176-8412-9_4, © Harold M. Edwards 2014 


4.1 | Introduction 77 


*Notation. Although in specific 
problems it is always preferable to 
give separate names to the variables, 
such as (p. q, r) or (S, T, U, V). this 
is inconvenient in open-ended 
problems where the number of 
variables is unspecified. It was once 
common to write (X, y,..., z) or 
(a,b, ..., Crewe, d) to indicate an 
unspecified number of variables of 
which x, y were the first two and z 
the last, or of which a, b were the 
first two, c some variable in between, 
and d the last. This notation has 
obvious drawbacks—é.g. a statement 
about two variables in between 
requires separate explanations—and 
the subscript notation 1s now 
prevalent. In this notation the natural 
numbers 7, 2, 3,... are used to 
index the variables so that (x1, X2, 
...,Xi,..., Xn) denotes an un- 
specified number n of variables of 
which a typical one is denoted x; 
where i is a natural number < n. 
This makes possible the very compact 
notation (1’) for m equations in n 
unknowns: 


ay1X1 +-+ 41nXn + by 


yı 


ami XI feet amnXn + bm. 


Ym 


One of these two remaining equations can be solved for 
one of the remaining unknowns (S, T, U), say 


S= 3p+4q—-2T-2U+43 
and used to eliminate S from the other equations 


V= 


r = 


U + 28 
+ 82. 


—3pt 39° — 
ap + 23q 


Since the equation for r now contains none of the un- 
knowns the process can go no further. The conclusion is 
that the given system of equations (1) is equivalent to the 
system 


V = -p+ 34 —- U+2 
(2) S= 4p+ 4q-2T-—2U+ 3 
r= 3p + 23q + 83 


In this form the original problem “given (p, q, r) find all 
possible values of (S, T, U,V)” can be solved im- 
mediately: If (p,q, r) do not satisfy the relation 


r= 3p + 23q + 83 


then there is no solution; if they do satisfy this relation 
then values of T, U can be chosen arbitrarily and V, S 
determined by the relations 


S= $4p+4q—2T—-2U+ 3 
V = —3p + 34 — U+ 23. 


This gives all possible solutions (S, T, U, V). 

Writing the given equations (1) in the form (2) can 
therefore be regarded as a solution of the problem of 
finding all possible (S, T, U, V) given (p, q, r). Similarly, 
a system of m equations in n unknowns can be solved by 
rewriting it in a form which gives the values of as many 
of the unknown quantities as possible in terms of the 
remaining ones. Thus a system* 

(1) yi = J, aijxj + bi 
j=1 

is solved by the following process: Choose one of the 

unknowns x1, X29, ..., Xn, Solve one of the equations for 

this unknown, and substitute the result into the remaining 

equations. For the sake of simplicity assume that the 

first equation can be solved for x, (i.e. assume a1, # 0) 


Chapter4 | Linear Algebra 


78 


and that this is the choice that is made. The result is a 
new set of equations 


x, = combination of y1, X2, X3,...5Xn 
Y2 


Il 


— " 
Ym = 


in which one x has been eliminated from the right-hand 
side in favor of one of the y’s. Next, one of the remaining 
equations with a y on the left is solved for one of the 
remaining x’s on the right and the result substituted into 
all the other equations. Assuming for the sake of sim- 
plicity that it is the second equation which is solved for 
Xə the result is a new system 


xı = combination of y1, yo, X3,..-5 Xn 


Xo = " 


y3 = " 

Va = " 

Ym = 
and so forth. It may well happen that a point is reached 
where the next equation cannot be solved for the next 
unknown (if a@,;; = 0 it would not have been possible to 
solve the first equation for the first unknown) even 
though all unknowns have not been eliminated from all 
the remaining equations. In this event it is convenient to 
rearrange and renumber the remaining equations and/or 
unknowns so that at the rth stage the equations express 
(X1, X25.. -3 Xrs Yr41s--- Ym) aS combinations of 
(Vis Y2- <-s Yrs Xr+1s <- -s Xn) The process terminates 
when none of the remaining equations contains any of 
the remaining unknowns. At this point the equations are 
of the form 


(a)  x:= J, Aayi t 2, Byxj+ Ci 
j=l j=r+1 
(2’) (ij = 1,2,...,7r) 
(2b') ys = DY Dijyy + Ei 
j=l 


G=rt+1,...,m) 


(where Ai; Bijz Ci, Di, E: are numbers) expressing 
X1,..., Xp as combinations of (y1, . . . 5 Yrs Xr+1s -<-s Xn) 
and y,41,--+ 5 Ym as combinations of (y1, . . . , Yr). Now 


4.1 | Introduction 79 


*The word ‘onto’ has become an 
adjective in mathematics. A mapping 
f: A>BofasetAtoasetB is 
said to be onto if every element of B 
is f of some element of A. 


(2’) is a solution of (1’) in that it enables one to find all 
possible sets of values (x1, X2, . . . , Xn) corresponding to 


a given set of values (y1, yo,..-, Ym): If the given 
(V1, V2. - -s Ym) do not satisfy the m — r relations (2b’) 
then no set of values (x1, X2, ..., Xn) can be substituted 


in the right side of (1’) to obtain the given (y1, yo,..., Ym) 
on the left. On the other hand, if the relations (2b’) are 
satisfied then there do exist such sets of values (xı, 
X2,...,X») and all such sets of values (x1, X9,..., Xn) 
can be obtained by choosing (714, Xr42,.-., Xn) ar- 
bitrarily and using the equations (2a’) to determine 
(Xis, X23.. -3 Xr). 

This simple elimination process is the core of linear 
algebra and should be examined from as many points of 
view as possible. In particular, a ‘geometrical formula- 
tion is very helpful in understanding the nature of sys- 
tems of linear equations such as (1) and (1°). The word 
‘geometrical’ is in quotes because most often there are 
more than three variables, which means that the spaces 
involved generally have more than three dimensions and 
hence cannot actually be visualized. Nonetheless, linear 
algebra makes only very simple statements about these 
higher dimensional spaces, and it is surprisingly easy to 
develop an intuitive understanding of these statements as 
generalizations of statements about planes and lines in 
space. 

A system of two equations in three unknowns, e.g. 


u=2x+4y—z+1 


1 " 
om) x= y + 4 
can be visualized as a mapping of space with coordinates 
(x, y, Z) to a plane with coordinates (u, v). The elimina- 
tion gives 


z=-ut+2x+ 44+ 1 

v= x— yt 4 
and hence 

z= —u— 40+ 6x+ 17 
(2’’) 


— v+ x+ 4. 


Thus there are no relations between u and v, and given 
(u, v) it is always possible to find (x, y, z). This means 
that the mapping of space to a plane defined by the sys- 
tem (1”) is ‘onto’,* that is, every point of the uv-plane is 


Chapter4 | Linear Algebra 80 


*/n ordinary usage, ‘space’ of course 
means three dimensions. In 
mathematics the different words 
line, plane, space are often 
inconvenient and often it is useful to 
refer to a line as a ‘one-dimensional 
space’ or a plane as a ‘two-dimen- 
sional space’. Similarly the set of all 
quadruples of real numbers is called 
‘four-dimensional space’, etc. 


tA set such as (3) where a given 
function has a given value will be 
called a ‘level surface’ regardless of 
its dimension. 


the image of some point of xyz-space. For each par- 
ticular point of the uv-plane, say (u,v) = (2, 3), the set 
of all points of xyz-space which are mapped to the given 
point form a line in xyz-space—in this case the line 


Zz = 
y = 


—2 — 4:3 + 6x + 17 = 6x + 3 
—3+ x+ 4= x+l. 
Similarly, the points of xyz-space for which (u, v) = (1, 1) 
are the points of the line 
6x + 12 
x+ 3. 


Z = 
y = 


It should be noted that this line is parallel to the previous 
one and that in fact the equations 


I 


const. 


(3) 


I 


const. 


always describe a line 


z = 6x + const. 
y= x+ const. 


parallel to these lines. Geometrically, the map (1°) de- 
scribes a map from a 3-dimensional space to a 2-dimen- 
sional space*. The ‘level surfaces’t of this map—that is, 
the set of all (x, y, z) for which the map has a given value 
(u, v)}—are a family of parallel lines in xyz-space. 

A system of 3 equations in two unknowns such as 


x= 3u- v+2 
y= u+ w 
z=2u+ v+ 1 


can be visualized as a mapping of a two-dimensional 
space to a three-dimensional one. The elimination gives 


v = — x+ 3u+2 
y= —2x+ lu+ 4 
z= — x+ 5u+3 
and then 
v= ax t+? 
= +w- 
z= GX TTT 


In this case a point of xyz-space is the image of some 


4.1 | Introduction 81 


*A function f: A — B is said to be 
one-to-one if no two points in the 
domain are carried to the same point 
of the range, i.e. if f(a) = Flag) 
implies ay = a2. 


point in uv-space only if the last relation is satisfied, i.e. 
only if (x, y, z) lies on the plane 


(4) 1z = 3x+5y +1. 


This plane is called the image of the map. For each point 
of the image there is exactly one point (u, v) given by the 
first two equations; thus the level surfaces of the map are 
individual points and the mapping 1s one-to-one.* 

The original example (1) can be regarded as a mapping 
from four-dimensional STUV-space to three-dimensional 
pgr-space. The elimination (2) shows that not every point 
of pqr-space is the image of a point in the STUV-space, 
but only those points which lie on the plane 


(4’) 3r = 2p + 8q + 26. 


This plane is the image of the map (1). For each point 
(p,q, r) of the image, the first two equations of (2) define 
a two-dimensional set (parameterized by T, U) where (1) 
has this value. Thus the level surfaces of (1) are planes in 
STUV-space. 

The general system of equations (1’) and its solution 
(2) can be described ‘geometrically’ in a similar way: 
The equations (2b’) describe the image of the map (1’). 
It is an r-dimensional subspace of the m-dimensional 
Y1Y2 . - - Ym-Space. The level surfaces of the map (1’) are 
the (n — r)-dimensional subspaces of x,X9...X,-Space 
described by the equations (2a’). Note in particular that 
the map (1’) is onto if and only if r = m, and is one-to- 
one if and only if r = n. 

Reverting to a more algebraic terminology, the integer 
rin (2’) can be described as the number of independent y’s 
since (2b’) expresses all of the y’s in terms of the r 
independent values (y1, yo,..., yr). Fixing the values of 
the y’s imposes only r independent conditions on the x’s 
(leaving n — r degrees of freedom) since the equations 
(2a’) express all of the x’s in terms of the y’s and in 
terms of the n — r independent values (X,44, Xr42,--- 
Xn). The integer r is called the rank of the system (1’). 

This discussion leaves unanswered the very important 
question of which sets of r of the y’s are independent and 
which sets of (n — r) of the x’s are independent when the 
y’s are given. In terms of the elimination process, this is 
the question: “When can a given set of r variables be 
eliminated from a given set of r equations?” This ques- 
tion is very conveniently answered in terms of the algebra 


Chapter4 | Linear Algebra 


82 


of forms of Chapter 1. For example: The original 
mapping (1) has rank r = 2. This is reflected in the fact 
that its image is the 2-dimensional plane (4’). Geo- 
metrically this implies that any 3-dimensional solid in 
STUV-space is carried by the map (1) to a figure with no 
volume (because it lies in a plane); hence, by the geo- 
metrical meaning of pullback, one would expect that the 
pullback of oriented volume dp dq dr under the map (1) 
would be zero. This is easily verified: 


dp dq dr = [2 dS + 4dT + 3 dU — dV][dS + 2dT + 3dU + dV] 

x [4dS + 8dT + 10dU + 2dV] 

= [(4 — 4)dS dT + (6 — 3) dS dU + (2 + 1) dS dV 
+ (12 — 6) aT dU + (44+ DdT dV + (3 + 3)dU dV] 
xX [4dS + 8dT + 10dU + 2dV] 

= [0-dSdT + 3dSdU + 3dS dV + 6dT dU + 6dT dV + 6 dU dV] 
x [4dS + 8dT + 10dU + 2dV] 

= (0 — 24 + 24) dS dT dU + (0 — 24 + 24) dS dT dV 
+ (6 — 30 + 24) dS dU dV + (12 — 60 + 48) dT dU dV 


= 0. 


The fact that the first two equations of (1) can be solved 
for (V, S) as functions of (p,q, T, U ) is reflected in the 
fact that the dS dV component of dp dq is not zero. 
Geometrically this can be seen as the statement that the 
planes T = const. U = const. (coordinatized by S, V) 
are mapped one-to-one onto the pq-plane (because this 
map of the SV-plane to the pq-plane multiplies oriented 
areas by 3 and therefore doesn’t collapse the plane to a 
line or a point); hence T, U, p, q suffice to determine S, V. 
In the same way, since the pullback of dp dq is 


dp dq = 0 dS dT + 3 dS dU + 3 dS dV + 6dT dU + 6dT dV + 6dU dV, 


it is to be expected that (S, U) can be expressed as 
functions of (p, q, T, V), that (T, V) can be expressed as 
functions of (p,q, S, U ), etc., except that (S, T) cannot 
be expressed as functions of (p,q, U,V) because for 
U = const., V = const., the map (S, T) — (p,q) mul- 
tiplies oriented areas by zero, which means that the map 
collapses planes coordinatized by (S, T) to lines or points. 
These conclusions are immediately verified by directly 
solving the relevant equations by step-by-step elimination. 
Since the coefficients of S, T in p, q are proportional, any 
combination of p, q which eliminates S must also elimi- 
nate T. 

Analogous considerations apply to the general system 


4.1 


| Introduction 


83 


(1’). If its rank is r then its image is r-dimensional and the 
projection of the image on any (r + 1)-dimensional 
‘coordinate plane’ in y-space has no ‘(r + 1)-dimensional 
volume’. Thus it is to be expected that the pullback of 
any (r + 1)-form in the y’s is zero. On the other hand, 
the equations (2b’) show that the projection of the image 
onto the yyə... y,-coordinate ‘plane’ is one-to-one and 
onto; hence there is an r-dimensional figure in x-space 
whose image in y-space has a non-trivial projection on 
the y;y2... y,-coordinate plane. Thus it is to be expected 
that the pullback of dy, dyo...dy, is not zero. More 
generally, any set of r of the y’s can be used to para- 
meterize the image of (1)—that is, all y’s can be ex- 
pressed in terms of these—if and only if the projection of 
the image on the corresponding r-dimensional coordinate 
plane is one-to-one and onto, which is true if and only if 
the pullback of the corresponding r-form is not zero. (In 
(1) the pullbacks of dp dr and dq dr are not zero, and, as is 
predicted by the above, the equation (4’) can be solved 
for q in terms of p and r or for p in terms of q and r.) 
Finally, the equations (2a’) show that for any fixed 
values Of Xr41, Xr42,---;Xn the map (1) composed 
with projection on the y,y2...y,-plane is onto; there- 
fore there is an r-dimensional figure in this plane x,41 = 
const.,...,X, = const. such that the value of dy; dye 
... dy, on the image is not zero. Since all r-forms in x 
except dx, dx2...dx, are zero on this figure (its pro- 
jection on other coordinate planes is not r-dimensional) 
this implies that the dx, dxg...dx,-component of the 
pullback of dy; dyə . . . dy, is not zero. More generally, 
any set of r of the x’s can be eliminated from the equa- 
tions for y1, yo,..., Yr if and only if the corresponding 
component of the pullback of dy; dy... dy, is not zero. 
In summary: A set of equations of the form 


(1’) yi = DS aux; + b; (i= 1,2,...,m) 
j=l 


J 


can be put in the form 


(2a) xi = DY Aiyit 2, Byxj+ Ci 
j=1 j=r+1 
(2') (i= 1,2,...,r) 
(2b) yi = 2 Diiyj + Ei 
j= 


G=rt+ti,r+2,...,m) 


Chapter4 | Linear Algebra 


Exercises 


84 


by (possibly) rearranging the x’s and y’s and eliminating 
as many x’s as possible from the right-hand side. The 
integer r is called the rank of the system (1’). Algebraically 
the rank can be thought of as the number of independent 
y’s and geometrically it can be thought of as the dimen- 
sion of the image of the mapping (1’). These interpreta- 
tions of the rank make the following assertions plausible: 
The pullback of any (r + 1)-form in the y’s, found by the 
familiar computational rules, is zero. A given set of r of 
the y’s is independent and can be used to find the re- 
maining (m — r) values if and only if the pullback of the 
corresponding r-form in the y’s is not zero. When this 1s 
the case, these y’s can be used to eliminate a given set of 
r of the x’s if and only if the corresponding term of the 
pullback is not zero. For example, the form (2’) can be 
achieved without rearrangement of the x’s and y’s if and 
only if the coefficient of dx, dx2... dx, in the pullback 
of dy; dya . . . dy, is not zero. 

These conclusions are contained, in a somewhat more 
concise form, in the Implicit Function Theorem, which 1s 
stated and proved in §4.4. The two intervening sections 
are devoted to developing needed definitions and 
notation. 


1 For each of the following systems find one reduction to 
the form (2’), the value of r, and all possible choices of r 
independent variables (variables on the right) and r dependent 
variables (variables on the left) for which such a reduction is 
possible. 


(a)u=3x+2y+1 


v=2x+ y—3 
(b) x = 2+ 1 

y= t4+2 

z= 4t—3 
() V=x+2y—24+7t+4 
(dju = 2x+ y— z 


v = —4x + 3y+2z— 4 
(e)a =3p+q+4 
b =2p—q+2 
c= p+q-—I1 
Q) u= 2x+ y— z— 4+1 
v = —4x + 3y + 2z — 12t + 1 


4.1 


| Introduction 


(g)x = p+ qt+3r-7 
y 2p + r 
Z — Tp + 24+ r+2 
(h) u = 3x + 2y + 8z 
U x— 3y— z 
w = 4x+ y+9z 


2 Solve the equations of Exercise 7, §1.3, for x, y as functions 
of u, v. Review this exercise. 


3 Consider the triangle in 4-dimensional space whose 
vertices are (4, 4, —3, 4), (8, 8, 3, 6), (2,0, —6, 2). Draw its 
projection on each of the 6 two-dimensional coordinate planes. 
Find a mapping 


aiiu + aigb + by 
a21u + a220 + be 
a3iu + a320 + b3 
a4iu + a420 + b4 


I 


x 
Y 
Z 

t 


of the uv-plane to xyzt-space which carries the triangle (0, 0), 
(1, 0), (1, 1) to the given triangle. Can the coefficients a;;, b; be 
chosen in more than one way? Which pairs of the equations 
xyzt can be used to eliminate uv from the right-hand side? 
Relate the answer to the first part of the exercise. Find the 
pullbacks of the 2-forms dx dy, dx dz, etc. to uv-space. 


4 Write the tetrahedron with vertices (2, 4,1, —1), 
(—1,1, 4,2), (0, 0, 2, —1), (1, 3, 3, 3) as the image under a 
map 

= a11u + a120 + ai3w + bı 

a21u + agev + ae3w + be 

a31u + azev + a33w + b3 

a4iu + a420 + aagw + b4 


~w N << X 
lI 


of the tetrahedron in uvw-space with vertices (0, 0, 0), (1, 0, 0), 
(1, 1,0), (1, 1, 1). Can this be done in more than one way? 
Find the pullbacks of each of the 4 basic 3-forms on xyzt- 
space. Which triples of these equations can be used to 
eliminate uvw from the right-hand side? 


5 Show that the equations 


u = ayix + a12y + ai3z + bı 
v = a2ix + a22y + a23z + b2 


can be solved for x, y as functions of u, v, z if and only if the 
dx dy-component of the pullback of du dv is not zero. [Form 
aoU — ajou and asiu — ayy. If aiia22 — ai2a21 Æ 0 
this gives the solution. Otherwise it shows that the map is not 
onto for fixed z; hence there is no such solution. ] 


6 Use the method of Exercise 5 to prove a necessary and 


Chapter4 | Linear Algebra 86 


Constant k-Forms on 
n-Space 


*/n this chapter only constant forms 
will be considered. Therefore, as in 
Chapter 1, the word ‘constant’ will 

be omitted. 


sufficient condition for it to be possible to eliminate u, v from 
the first two equations of the system 
= ajiu + a120 + by 
y = agiu + a220 + be 
a3iu + a320 + b3. 


Z 


Assuming the condition is met, write the final form (2’) of 
the solution explicitly in terms of the a’s and b’s. 


7 Under what conditions on the a’s and b’s is the map of 
Exercise 5 of rank one? Describe geometrically, i.e. give 
dimensions of range, domain, image, and level surfaces. 


8 Under what conditions on the a’s and b’s is the map of 
Exercise 6 of rank one? Describe geometrically. 


9 In the main example (1) find the pullbacks of dq dr and 
dp dr, showing that they are multiples of dp dq. Derive the 
same result, along with the specific multiples, using the last of 
the equations (2). 


4.2 


As the discussion of §4.1 indicates, the algebra of forms 
is useful in the linear algebra of higher dimensions. This 
algebra of forms is described very succinctly by the rules 


dx; dx; = 0 


1 
0) dxi dx; = — dx; dXi. 


A k-form* in x1, X2, . . . , Xn is described by an expression 
which is a sum of terms of the form A dx,, dXiz ... dXi, 
where A is a number and /), i2,..., ip are integers 
1 < i; < n. The pullback of a k-form in yy, yo,.--5 Vm 
under an affine map 


(2) yi = Dy asx; + bi 


is the k-form in x, Xo,...,X, obtained by performing 
the substitution dy; = }_a;; dx; and using the rules (1) 
together with the distributive law of multiplication. In 
short, the algebraic rules of 1-, 2-, and 3-forms in 3 
variables, which were described and used in Chapter 1, 
will now be applied to cases where there are more than 3 
variables. 

The basic fact about this algebra of forms is the follow- 
ing statement which, for reasons to be explained in the 
next section, is known as the Chain Rule. 


4.2 | Constant k-Forms on n-Space 


*Here, as in Chapter 1. an affine map 
is a function described by 
polynomials of the first degree. 


051(4,1 AX; Hte + adin dx,) + °° 


= (æji + °° 


87 
Chain Rule 
Let 
Ji = >, aix; + bi (i= 1,2,...,m) 
j=l 
and 


Zi = Dd, ayy; + B: (i= 1,2,..., p) 
j=l 


be affine maps* from (x1, X2,..., Xn) to (v1, Yo, .- + Vm) 
and from (1, Y2,- --, Ym) tO (21, Z2,.-.,2Zp) so that 
their composition is an affine map from (x1, X2, ..., Xn) 
to (Z1, Z2,..., Zp). Then the pullback of a k-form under 
the composed map is equal to the pullback of the pull- 
back. 


This was proved in §1.3 for 2-forms in two variables 
and was stated, but not proved, for 3-forms in three 
variables in §1.4. For l-forms in any number of variables 
the Chain Rule is proved by direct computation: The 
composed map 1s 


Zj = a5 1(411x1 + AyoXe ttt + AinXn) t 
F Ojm(Gm1X1 + am2X2 + ` `t + AmnXn) + const. 
= (a;1411 + a@52Q21 +-+ Ajmâm1)Xı 4- 


+ (@j1@1n + Aj2@on + `°: + Qjmaâmn)Xn + const. 
so the pullback of dz; under the composed map is by 
definition 


dz; = (051411 + °°* + a@jmQmi) dxi + °°: 
+ (a5121n +e QjmAmn) aX, 


whereas the pullback of the pullback is the pullback of 
Qj dyı + Qj? dyz ++ Aim Am 
which is 


7+ Ojm(Qm1 ax +++ amn dx») 
+ QjmAm 1) ax, +-+ (ajian +-+ &jmAmn) dXn. 


Thus the pullback of dz; under the composed map is the 
pullback of the pullback. But then the same is true for an 
arbitrary l-form >. A, dz; by superposition of these cases. 

The proof of the Chain Rule for k-forms (k > 1) 
requires a more careful examination of the algebraic 
rules which govern them, and in particular an examina- 
tion of the rule dx; dx; = — dx; dxi. What is the meaning 
of ‘equal’ in this ‘equation’? Clearly if it is to have any 


Chapter4 | Linear Algebra 88 


meaning at all then it must mean that the same k-form 
can be represented in different ways; for example, 
dx, dx_dx3 = —dx,dx3dx_ = dx3 dx; dx» must mean 
that the different expressions dx, dx2 dx3 and dx3 dx, dx2 
represent the same 3-form. This is formalized by the 
following definitions. 


Definitions 


A k-form in the variables x1, x9,..., Xn is represented 
by an expression which is a sum of a finite number of 
terms of the form A dx; dx;,...dx;, in which A is a 
number and i, i2,..., ip are integers 1 < i; < n. (Or by 
a sum of no such terms, which is represented by 0.) Two 
such expressions represent the same k-form if one can be 
obtained from the other by a finite number of applica- 
tions of the rules 


(i) dx; dx; = — dx; dx;, 
(1i) dx; dx; = 0, 
(iii) A dXi, dXig e.’ dX; + B dXi, dXi, e. dXi, 
= (A + B) dxi dxi,... dxi,, 
(iv) the commutative law of addition. 


In words: Terms can be arranged in any order (iv). Two 
terms with the same dx’s (identical and in identical order) 
can be combined (iii). A term which contains two 
identical factors dx; adjacent to each other can be 
stricken (ii). Two adjacent factors dx; dx; in a term can 
be interchanged provided the sign of the coefficient A is 
also changed (i). 
Thus, for example, 


3 dxı dxXq dx3 + 2 dx; dx3 dx; + 4dx3 dx2 dx, 


*Geometrically the k-form 

dx; dx2 . . . Ax, should be imagined 
as oriented k-dimensional volume of 
the projection on the x1X2q...Xk- 
plane, a function assigning numbers 
to oriented k-dimensional surfaces in 
X1X2 . . . Xn-Space. This intuitive 
interpretation is, of course, not at all 
a definition. 


3 dx, dxə dxa — 2 dx; dx; dx3 — 4dx3 dx, dx2 
3 dx, dx dx3 + 4 dxı dx3 dxo 

3 dx, dxə dx3 — 4 dxı dxə dx3 

—dx, dxə dX3 

dx dXxı dx3 


etc., where = means ‘represent the same 3-form’ as 
defined above. Note that the term ‘k-form’ itself has not 
been defined* but that only the terms ‘representation of 
a k-form’ and ‘represent the same k-form’ have been 
defined. This fine point in the nomenclature has no effect 
on the way that computations are done, but it does 


4.2 | Constant k-Forms on n-Space 


simplify the underlying philosophy of the subject con- 
siderably. (See Appendix 3.) 

The sum of two k-forms w and ø is the k-form w + o 
represented by the expression obtained by adding the 
terms of a representation of w to the terms of a repre- 
sentation of o. This definition is valid because of the 
obvious fact that if either the representation of w or the 
representation of ø is changed to another expression 
representing the same form, then the expression for the 
sum represents the same form as before. 

The product of a k-form w and an h-form ø is the 
(k + h)-form wo represented by the expression obtained 
by writing representations of w and ø side by side in that 
order and multiplying out using the distributive law of 
multiplication and the rule 


A dx, dXi, s. dXi, -B dx;, dx;. wee dx;, = AB dx;, dxi. we dXi, dx;, dx;, cee dX; 


This definition is valid because of the obvious fact that if 
either the representation of w or the representation of o 
is changed to another expression representing the same 
form, then the expression for the product will represent 
the same form as before. For example 


(dx, + 2 dxə)\(3 dxı dxə + 4 dx2 dx3) 
= 3 ax, dx, dx» + 4 ax axe dx + 6 aX» ax} dx» + 8 dx 2 dx» dxs 


and 


(2 dx2 + dxı)(3 dx; dxə — 4 dxz dx2) 
= 6 dx» dxı dX» — 8 dxo dxs dxo + 3 dxı dxı dx» — 4 dxı dxs dxo 


represent the same 3-form (which is also represented 
more simply by 4 dx, dx» dxs3). 

A zero-form is a number. If A is a 0-form and 
dx, dx_...dx, is a k-form then their product is the 
(0 + k)-form A dx; dxə . . . dx, as before. 

This completes the definition of the algebraic rules 
which govern computations with forms. 


Clearly wo; + O2) = 001 + WO 9, (wi + w)0 = 
w10 + woo, w(oT) = (wa)r. If w and o are 1-forms, 


then wa = —ow because dx; dx; = — dx; dx; (even if 
i = j, in which case both sides are 0). Therefore if w is a 
l-form, w:w = —w:w, 2w:w = 0, which implies 
w:w = 0. Therefore the rulesw:w = 0,w:0 = —oO'w 


apply to l-forms generally, and not just to the 1-forms 


Chapter4 | Linear Algebra 90 


*This argument can be stated in the 
language of modern algebra as 
follows: ‘Pullback' is an algebra 
homomorphism from forms in y to 
forms in x. Therefore ‘pullback of 
the pullback’ is also an algebra 
homomorphism. To prove it is equal 
to the algebra homomorphism 
‘pullback under the composed map’ 
it suffices to show the two are equal 
for a set of generators for the algebra; 
hence it suffices to show they are 
the same for the 1-forms dz}, dz2, 
..., Zp, Which is done by direct 
computation. 


(Y= _ 
k kine) SKS" /. 


Itis1 ifk =Oork=nandoO 

ifk > n. It should be imagined 

here as the number of k-dimensional 
coordinate planes in n-space. 


dxi. (More generally, if w is a k-form and ø is an h-form, 
then wr = (—1)""ow.) 

In terms of these algebraic operations on forms, the 
operation of forming the pullback of a k-form w in 
Yis Y2,-+-+5 Ym under an affine map 


yi = dS ayx; + b; (i = 1,2,...,m) 
j=l 


is literally an operation of substitution of the 1-forms 
2 a;i; dx; for dy; in an expression representing w, the 
resulting sums and products being regarded as sums and 
products of 0-forms and 1-forms in the variables x, 
X9,...,X,y. This definition is valid because of the obvious 
fact that if the expression of w is replaced by another 
expression representing the same k-form then the result- 
ing expression for the pullback of w represents the same 
k-form (because w:w = 0 and wa = —ow hold for all 
l-forms w, o). 

Clearly the pullback of a sum is the sum of the pull- 
backs and the pullback of a product is the product of the 
pullbacks. Therefore the pullback of a k-form w can be 
found by taking any expression of w as a sum of products 
of 0-forms and 1-forms, and taking pullbacks of the 
l-forms separately (the pullback of a 0-form is itself). 
Thus the pullback operation is completely determined 
once its effect on l-forms is known. But the same is true 
of the operation ‘pullback of the pullback’ and, because 
for 1-forms this is identical to the operation ‘pullback 
under the composed map’, the same is true for k-forms.* 
This proves the Chain Rule. 


In computing with forms it is very useful to allow the 
expression as much latitude as possible—as was done in 
the above proof. On the other hand it is also useful, 
particularly in deciding whether two different expressions 
represent the same k-form, to have a standard format for 
the expression of a k-form. This is accomplished as 
follows. 


Definition 
An expression representing a k-form in the variables 


X1, X9,-.-, Xn 1S said to be in lexicographic (dictionary) 
order if it answers to the following description: The 


- n\. 
expression is a sum of @ terms, where @ is the 


binomial coefficient. In particular, if k > n then there 


4.2 | Constant k-Forms on n-Space 


91 


are no terms and the expression is 0. If k < n then the 
expression is a sum 


o Aj jacja dX;, ax, e. dXj, 
1S 51 <dg<+++<G,Sn 


of (7) terms ordered so that the term corresponding to 


(Jis j2,- .., Jx) precedes all terms in which jı is greater, 
all terms in which j; is the same but ja is greater, all terms 
in which jı and jə are the same but jg is greater, and so 
forth. In other words, the k-tuples (ji, jo,...,5 jx) in 
which 1 < ji < ja < °° < jk < n (there are r) such 
k-tuples ) are ordered like words in a dictionary so that 


(ji, --.-, jk) precedes (ji, ..., Jz) if 7; < ji in the first 
position where they differ. The numbers Á; jz---j, may 
be zero but the terms are written anyway so that there are 


always @ terms. 


The idea of lexicographic order is easier to illustrate 
than it is to define in full. For example, for 3-forms in 
X1, X2, X3, X4, Xs, the triples in order are 


(1, 2, 3), (1, 2, 4), CL, 2, 5), G, 3, 4), C, 3, 5), 
(1, 4, 5), Q, 3, 4), (2; 3, 5), (2, 4, 5), (3, 4, 5) 


and an expression in lexicographic order is an expression 
of the form 


A193 dxı dx2 dx3 + A124 dx; dxə dx4 + +++ + A245 dx dx4 dx5 + A345 dx3 dx4 dX5 


(10 terms). 


Theorem 


Every k-form can be represented by an expression in 
lexicographic order. Two expressions in lexicographic 
order represent the same k-form only if they are identical. 


Proof 


The first statement of the theorem is the observation that 
the rules defining ‘represent the same k-form’ allow one 
to reduce any expression representing a k-form to one in 
lexicographic order. This is easily seen to be true (see 
Exercises 1, 2, 3). 

Suppose now that two expressions w and ws in lexico- 


Chapter4 | Linear Algebra 


92 


graphic order represent the same k-form but are not 
identical. Let dx;, dx;,...dx;, be a term in which w, 
and wə differ and let ©, We be the pullbacks of w1, we 
under the affine map 


Xj, = U1, Xja = UQ,- ~~ > Xj, = Ub 


xi = 0 (i A fi, jo, -- +s jk) 


Of (u1, U2, . . . , Uk) tO (X1, X2,- - ., Xn). Then 1, Ge are 
of the form A, du, duo ... dug, Ao du, duə ... dup with 
A, # Ao, but G1, We represent the same k-form in 
Uj, Ug,..., Ug. (If the two expressions represent the same 
form then so do their pullbacks; for the particular affine 
map here the proof of this fact is utterly trivial.) This 
means that it is possible to interpolate expressions 
To, 01, C2, ..., Oy representing k-forms in ui, Uo,..., 
ux such that oq is identical to A, du, dug... dug, such 
that oy is identical to Ag du; dug... dug, and such that 
the step from a; to 0,41 involves just one application of 
just one of the rules 


(i) du; du; = — du; dui, 
(ii) du; du; = 0, 
(iii) A dui, du;,...du;, + B du; dui, . . . dui, 
= (A + B) dui, dui, e. dui, 
(iv) the commutative law of addition. 


It is to be shown that if co, C1, C2,..., 6y is such a 
sequence of expressions then Á; = Ag. 

The proof is by induction on k. If k = 1 then the 
expressions g; are all sums of multiples of du, (there are 
no other du’s) and rules (i) and (ii) do not apply. Let 
Oi = Cii du: -+ Cio du, He CiM duy; then the 
sum Cy, + Ci + °°: + Cim is the same for all 7 be- 
cause either of the changes allowed by (iii) and (iv) 
leaves the sum unchanged. Since this sum is A; when 
i = Qand A» when i = N, it follows that A, = Ag and 
the case k = 1 is proved. 

Now suppose the case k — 1 is proved and suppose 
that Co, 01,..., Gy are as above. Note first that if any 
and all terms in the ø; which contain repeated factors 
(not necessarily adjacent) are stricken then the new 
sequence do, 01,...,@n has the same properties as 
before except that at a step a; to o;,, where rule (ii) 
was used the new a; and gi}; are now identical. There- 
fore it can be assumed at the outset that the given ex- 
pressions Co, 01, C2,..., Gy contain no terms with 
repeated factors and that the rule (ii) is never used. Thus 


4.2 | Constant k-Forms on n-Space 


Exercises 


93 


each term of each a; contains the factor du; exactly once. 
Define a sequence of expressions Êg, @1,..., Gy repre- 
senting (k — 1)-forms in ui, Wo,..., Uzx—1 aS follows: 
In each term of each g; strike the factor du; and multiply 
the term by +1, using +1 if the du, occurs in the kth, 
(k — 2)nd, (k — 4)th,...,(k — 2v)th place in this 
product of k factors, and using —1 if it occurs in the 
(k — l)st, (k — 3)rd,...,(k — 2v + 1)st place. Then 
Fo = Aj du, dus e. duk—ı, On = Ao du, dus wee dur —4 
and it suffices to show that the expressions ĉo, &1, Go, 
...,@y have the property that each can be obtained 
from the preceding by one of the rules (i)-(iv). If the step 
from g; to di1 involves one of the rules (iii) or (iv) then 
the step from &; to 6:41 1s obtained by the same rule. If 
the step from a; to o;, uses rule (i) to interchange two 
adjacent factors other than du; then the same applies to 
the step from &; to @,;,,. Finally, if the step from a; to 
0:41 interchanges du; with another du then this term 
changes sign between g; and o;,, and the position of 
dux changes by one, which implies that 6; = 6;,1. This 
completes the proof of the theorem. 


In the following exercises, sequences of consecutive letters 
(u, v, w, x, Y, Z), (p,q, r, $, t), (a, B, Y, ô) will be used to denote 
the variables. Then ‘lexicographic order’ means quite literally 
‘alphabetical order’ in the everyday sense. 


1 Find expressions in lexicographic order which represent 
the same 4-forms as the following. 


(a) 3 dw dx du dv — 4 du dw dv dx + 2 du dy dw dz 

(b) (3 du do + 4 du dw)(dy + dz) dw 

(c) (du dv + dx dy)(du dv + dx dy) 

(d) (4 du dv dw + 4 dx dy dz + 2 du dw dy + dv dx dz) 
X (du + dw + dy) 


2 Find expressions in lexicographic order which represent 
the pullback of each of the 4-forms of Exercise 1 under the 
affine map 


u=r+s+1 
v=r—-s 
w=2p+t 
xXx =p—q 
y=q-—r-T7 


z=p+qą4ą+r+s+t. 


Chapter4 | LinearAlgebra 94 


Matrix Notation. 
Jacobians 


*/f f: A — Bis a function from the 
set A to the set B then A Is called 
the domain of f and B the range. 
Above, the domain is xyz-space and 
the range Is uv-Space. 


3 Find expressions in lexicographic order of the pullbacks 
of the 4-forms of Exercise 2 under the map 


p= 4a — 384+ 7 
= 2a — ô + 14 
a+ Bp 
a—p 
= 144. 


4 Find the composed map (a, 8, Y, 5) — (u, v, w, x, y, z) of 
the maps of Exercises 2 and 3, find the pullbacks of the forms 
of Exercise 1 by direct computation, and verify the Chain Rule 
by comparing these answers with the answers to Exercise 3. 


ws & “~~ SQ 
[| 


5 List all possible orders in which the 1-forms dx, dy, dz, dw 
can be multiplied to give a 4-form. Which of these represent 
the same 4-form as dx dy dzdw and which the same as 
—dx dy dz dw? 


4.3 


In most cases, the additive constants 5; in an affine map 
yi = J apx; + b; are of minor importance. This is 
emphasized by writing the affine map in the form 


(1) yi = D> aix; + const. (i= 1,2,...,m) 
j=1 


The numbers a;;, which are called the coefficients of the 
affine map, can be conveniently exhibited, without the 
y’s and x’s, in a rectangular array or matrix called the 
matrix of coefficients of the map (1). Thus, for example, 
the matrix of coefficients of the map 


3x —y+2z+ 8 
x+y+4z+ 14 


l 


iS 


In particular, the number of columns of the matrix of 
coefficients is equal to the dimension of the domain of 
the affine map and the number of rows equal to the dimen- 
sion of the range.* 

In general a matrix is a rectangular array of numbers, 
and an m X n matrix is a matrix with m rows and n 
columns. [Thus the matrix above is 2 X 3.] A matrix is 


4.3. | Matrix Notation. Jacobians 


95 


therefore simply a convenient format in which sets of mn 
numbers can be recorded, which per se is of very little 
interest or importance. What is important, rather, is that 
certain operations with matrices occur in many contexts 
and are useful in a great variety of problems. This 
section is devoted to two of the most important opera- 
tions with matrices, namely matrix product and exterior 
power of a matrix. They are in fact nothing new at all, 
but are simply the operations of composition of affine 
maps and computation of pullbacks under affine maps 
written in a new notation. 

The first operation, matrix product, is the operation of 
composition of affine maps. Consider, for example, two 
affine maps 


= 3x — y+ 2z + const. r 


x+ y+ 4z + const. s 
t 


u + 8v + const. 
Tu + 13v + const. 
—4u + 6v + const. 


I 


= 
| 
L 


= 
| 


with matrices of coefficients 


respectively. Their composition is given by 


r= (3x — y+ 2z + const.) + 8(x + y + 4z + const.) + const. 


™ Un 
ION u EAE 


[3 + 8]lx + [-1 + 8]y + [2 + 8: 4]z + const. 

Ilx + 7y + 34z + const. 

7(3x — y + 2z + const.) + 13(x + y + 4z + const.) + const. 
[7-3 + 13]x + [7-(—1) 4+ 13]y + [7:2 + 13-4]z + const. 

34x + 6y-+ 66z + const. 

—4(3x — y + 2z + const.) + 6(x + y + 4z + const.) + const. 
[((—4)-3 + 6]x + [(—4)(—1) + 6]y + (4) :2 + 6: 4]z + const. 
—6x + 10y + 16z + const. 


The matrix of coefficients of the composed map, namely 


ll 7 34 
34 6 66], 
—6 10 16 


depends only on the matrices of coefficients M and N of 
the maps themselves. It is called the product (or com- 


Chapter4 | Linear Algebra 96 


*/f f and g are functions f: A —> B, 
g: B — C then the composed 
function A —> C is denoted g ° f, 
that is, ‘g of f of an element of A’. 


position) of these two matrices. The factors are written 
in the same reversed order NM as are the factors of a 
composed map*; thus 


ll 7 34 1 8 
(2) 34 6 66)={ 7 13 (i = 3) 
—6 10 16 —4 6 


More generally, if two affine maps 


n 
yi = >> aix; + const. (i= 1,2,...,m) 
j=l 
and 


zi = >) diy; + const. (i= 1,2,...,p) 
j=l 


are given, with matrices of coefficients 
M = (a;;) and N= (b;;) 


respectively, then the matrix of coefficients of the com- 
posed map 


(3) Z; = > l bina x; + const. 
k 


j=1Lk=1 


can be found from the matrices M and N and is called 
their product NM. In this way the m X n matrix M 
combines with the p X m matrix N to give the p X n 
matrix NM. Formula (3) shows that the coefficients of 
the matrix NM are found from the coefficients of N and 
M by the rule: 


The term in the ith row and jth column of NM is 
(4) equal to ` b;,a;; where b;, is the term in the ith 
k=1 
row and kth column of N and ax; is the term in 
the Ath row and jth column of M. 


In other words, the term in the ith row and jth column 
of NM is found from the ith row of N and the jth column 
of M by multiplying corresponding terms and adding. 
For example the term 66 in the second row and the third 
column of the product (2) is found from the row and 


column 
(7 13) (7) 


4.3 | Matrix Notation. Jacobians 97 


by multiplying corresponding terms and adding 7-2 + 
13-4 = 66. 

Note that the product NM of two matrices N and M is 
defined if and only if the number of columns of N is 
equal to the number of rows of M. Thus, for example, the 
matrices N, M above can be multiplied in the opposite 
order and the result can be found by the rule (4) to be 


(; —1 2) , a -(} 5) 
Lo O1 4N 5 —8 45 


The computation of products of matrices is a simple 
procedure which can be (and should be) mastered with a 
little practice. It can be carried out as a two-handed 
—? | ) _ X operation, which involves running the index finger of the 
left hand across a row of the first factor while running 
the index finger of the right hand down a column of the 
second, multiplying corresponding entries and keeping a 
running total (assuming the numbers are simple enough 
that the arithmetic can be done mentally). Once again, 
this process merely represents a streamlined computation 
of the matrix of coefficients of a composed affine map. 
The second operation, formation of the exterior 
*This name is not very descriptive of powers*¥ of a matrix, is an abbreviated format for the 
ihe operation, since it doesnot computation of pullbacks of forms under an affine map. 
resemble at all the operation of . . (k) . 
multiplying the matrix by itself — More specifically, the kth exterior power M*® of a matrix 
which would be possible only for M tells how to find the pullback of k-forms under an 
square matnces— and since ner eiS affine map whose matrix of coefficients is M. Although 
g exterior aboutit ++ is clear that only the matrix of coefficients M = (a;;) 
is used in finding the pullback of a k-form under an 
affine map y; = > a;;x; + const., it is not clear how 
best to organize the computation. Consider the example 
of the map 


p=4S+4T+ 3U— V+2 
q= S+2T+ 3U+ V-3 
r=4S+8F+10U+2V +2 


of §4.1. The matrix of coefficients is 
4 4 3 —i 
M=\1 2 3 1 
4 8 10 2 


and the pullback maps are as follows. 


Chapter4 | Linear Algebra 98 


1-forms 


A 1-form on pqr-space can be written A dp + B dq + 
C dr. Its pullback is 


A(4dS+4dT+ 3dU— dV) 
+ B( dS+2aT+ 3dU+ dV) 
+ C(4dS + 8dT + 10dU + 2dV) 

= (44+ B+ 40C)dS 

+ (44+2B+ 8C)dT 

+ (34 + 3B + 10C) dU 

+(-A+ B+ 2C)dV. 


This formula is summarized by the matrix 


MP — 


which tells how to find the pullback of A dp + B dq + 
C dr given the numbers A, B, C. Note that M‘” can be 
obtained from M merely by interchanging rows and 
columns; that is, the entry in the ith row and jth column 
of M‘” is the entry in the ith column and jth row of M. 
M‘” is also called the transpose or adjoint of M and is 
also denoted by Mt or M* as well as by M‘". 


2-forms 


A 2-form on pgr-space can be written Adpdq+ 
B dp dr + C dq dr. (Here, and in the remainder of this 
section, the lexicographic order is used.) The pullback of 
A dp dq + B dp dr + C dq dr is 


A(4dS + 4dT + 3 dU — dV (dS + 2 dT + 3 dU + dV) 

+ B(4dS + 4dT + 3 dU — dV (4 dS + 8 dT + 10 dU + 2 dV) 

+ C(dS + 2dT + 3 dU + dVX(4dS + 8dT + 10 dU + 2dV) 

= (44+ 16B +0: ©) dSdT 

+ (9A + 28B — 2C) dS dU 
+ (5A + 12B — 2C) dS dV 
+ (6A + 16B — 4C) dT dU 
+ (6A + 16B — 4C) dT dV 
+ (6A + 16B — 4C) dU dV 


where the terms have again been arranged in lexico- 
graphic order dS dT, dS dU, dS dV, dT dU, dT dV, dU dv. 


4.3 | Matrix Notation. Jacobians 


99 


This formula is summarized by the matrix 


4 16 0 
9 28 —2 
5 12 —2 

2 
M` = 6 l6 —4 
6 16 —4 
6 l6 —4 


which tells how to find the pullback of 4A dp dq + 
Bdpdr+ C dqdr, given the numbers A, B, C. The 
columns of M‘ correspond to the three basic 2-forms 
dp dq, dp dr, dq dr in that order, and the rows correspond 
to the six basic 2-forms dS dT, dS dU, etc. The entries in 
each column simply give the components of the pullback 
of the corresponding 2-form. 


3-forms 


A 3-form on pgr-space can be written A dp dq dr. Its 
pullback is 


O:dS dT dU + 0:dSdTdV + 0:dSdU dV + 0: dT dU dV 


as was seen in §4.1; hence for this example 


0 


0 
(3) 
M 0 
0 


This matrix has one column for each of the basic 
3-forms on the range, one row for each of the basic 
3-forms on the domain. 

In general, the kth exterior power M“ of an m X n 
matrix M can be described as follows: Consider M as 
the matrix of coefficients of an affine map from n-space 
to m-space. Consider the coordinates on n-space and 
m-space as being ordered, and use the lexicographic 
ordering in representing k-forms so that k-forms on 


m-space are represented by their r) coefficients (in 


k 
order) and k-forms on n-space by their @ coefficients 
(in order). Then the pullback map is represented as a map 


m _ n i (k) : n m 
from (z) space to (7) space. M`™ is the @ x (z) 


matrix of coefficients of this map. That is, M® has one 


Chapter4 / Linear Algebra 100 


o(p. q) 
a(S, T) 
Jacobian of p. q with 
respect to S, T”. 


*The symbol is read ‘the 


m 


column for each of the ( k 


) basic k-forms on m-space and 


n 
k 
(both in lexicographic order). The entries of M“ give 
the coefficient of the corresponding k-form on n-space in 
the pullback of the corresponding k-form on m-space. 
[If k>nork>m then M™ is a ‘matrix with no 
rows’ or ‘matrix with no columns’.] 

It is useful to have a notation for the individual 
entries of M“. Since 


one row for each of the ( ) basic k-forms on n-space 


_ Op Op Op op 
dp = 55 dS + dT + 5 dU + 5 dV 
_ 99 oq ðq ôq 
dq = dS + aT + dU + av 
or or or or 


. op. . 
the notation co is already used for the coefficient of dS 


in the pullback of dp, which is the entry of M‘” in the 
row corresponding to dS and the column corresponding 
to dp. This notation is generalized to 2-forms by writing* 


_ 9(P; q) o(p, q) _ 
dp dq = a(S, T) OE +t a(S, U) dS dU + 
_ (p, r) olp, r) o. 
dp dr = AST) dS dT + a(S, U) dSdU + 
_ 0(q,r) ola, r) E 
dq dr = XST) dS dT + XS, U) dSdU + 
O(p, r) 


so that, for example, denotes the coefficient of 


a(S, U) 

dS dU in the pullback of dp dr, that is, the entry of M°” 
in the row corresponding to dS dU and the column 
corresponding to dpdr; in the example above 
a(p, r) 
a(S, U) 
by symbols of the form 


= 28. In general, the entries of M® are denoted 


0(k variables on the range) 
0(k variables on the domain) 


called the Jacobian (or Jacobian determinant) of the k 
variables in the ‘numerator’ with respect to the k variables 
in the ‘denominator’. 


4.3 / Matrix Notation. Jacobians 


*So called because this number 
determines whether the n equations 
in n unknowns Yi = > )aijxX; can be 
solved for the x's as functions of the 
y's. This fact, which is a special case 
of the Implicit Function Theorem, is 
proved in the next section. If the 
determinant of the matrix of 
coefficients is # O then solution is 
possible, otherwise it is impossible. 


101 


If Mis ann X n matrix then M™ isa 1 X 1 matrix, 
1.e. a number. This number is called the determinant* of 
the square matrix M. [Note that only square matrices 
have determinants.] More generally, if Mis anm X n 
matrix then the entries of M“ are the determinants of 
k X k matrices, namely the k X k matrices obtained by 
striking out all but k rows and all but k columns of M. 
(A matrix obtained in this way is called a ‘k X k minor’ 
of the matrix M.) For example, the coefficient 
o(p, r) 
a(S, U) 
and writing 


= 28 in M'” above is found by ignoring q, T, V 


p = 4S + 3U + other terms 
r = 4S + 10U + other terms 


dp dr = 


tNote the reversal of order. M maps 
n-space to m-space, N maps m-space 
to p-space, hence NM maps n-space 
to p-space. NV maps forms on 
p-space to forms on m-space and 
M®) maps forms on m-space to 

forms on n-space, hence MYN% 
maps forms on p-space to forms on 
n-space. 


(4dS + 3dU+-+-\(4dS + 10 dU + `+) = 28 dS dU + +- ° 


that is, by finding the second exterior power of the 


2 X 2 matrix 
4 3 
4 10 


which isa 2 X 2 minor of M. 

The computation of the determinant of a k Xk 
matrix—and hence the computation of the exterior power 
M® of a matrix—normally involves a prohibitive amount 
of arithmetic when k is at all large (say k > 3). For- 
tunately it is seldom necessary or even useful to carry 
out such a computation. What is essential is the idea of 
the pullback map and the fact that the exterior power 
M® of a matrix is the matrix of coefficients of the 
pullback map. (For the technique of computing de- 
terminants see Exercise 2.) 

In summary, the exterior power M‘® is merely a new 
notation with which to describe the pullback map, just 
as the matrix product was a new notation with which to 
describe compositions of affine maps. 

The Chain Rule, which gives the relationship between 
the operations of composition of maps and formation 
of pullbacks can therefore be stated as a relationship 
between the operations of matrix product and exterior 
powers, namely 


(5) 


That is, the pullback under the composedf map is the 


(NM)® = MeN, 


Chapter4 | Linear Algebra 


102 


pullback of the pullback. N and M must, of course, be 
matrices such that the composition NM is defined; that 
is, the number of columns in N must be equal to the 
number of rows in M—say M is an m X n matrix and 
N a p Xm matrix. Two cases of (5) are particularly 
important in that they are used frequently: 


(i) k = 1. Then MP, N®, (NM) are merely the 
transposes of M, N, NM and (5) states that the 
transpose of a product of two matrices is the pro- 
duct of the transposes in the reverse order. This is 
easily proved directly from the definition of the 
product of matrices. 

(ii) m= n = p = k. In this case, M, N, NM are 
square matrices and M®, N™ (NM) arel X 1 
matrices, i.e. numbers. These numbers are by 
definition the determinants of M, N, NM re- 
spectively and (5) states that the determinant of 
the product of two square matrices is equal to the 
product of their determinants. This fact is not at 
all obvious from the definition of the product of 
matrices and the definition of the determinant of a 
Square matrix, except insofar as the Chain Rule 
itself is obvious from the geometrical meaning of 
pullbacks. 


The chain rule in Jacobian notation is In terms of Jacobians these two instances of the rule 


OZ 


Oz Oy 


OX 


OY Ox 


(5) can be stated as follows: If Z1, Z2,..., Zp are affine 
functions of yy, yo,..., Ym and if yy, yo,.--, Ym are 
affine functions of x), X2,...,Xn then the partial 


e e OZ; e . 
derivatives P of the composed function are given by 
x; 


s>- P) 


(8) ~ 2 OOD 


Ox; 1 Yv Ox; (J 


O21 _ șa OW (i=l, 
Sa 


If n = m = p, then 
7 0(21, Z25+++ 5 Zn) O01, V2, - ++ 5 Yn) — 0(Z1, Z2,+++5 Zn) . 
(7) En Zorin?) Or Yass n) L Enza) 
001, Yay +++5)n) 0(X1, XQ,+++5Xn) 0(%1, X25+++5Xn) 


The Chain Rule derives its name from the resemblance 
of these formulas to the Chain Rule of Differentiation 


dz _ dz dy. 
dx dy dx 


4.3. | Matrix Notation. Jacobians 103 


In summary, the notion of an m X n matrix has been 
defined, the operations of product and exterior power of 
matrices have been defined, the notion of the Jacobian of 
k coordinates on the range with respect to k coordinates 
of the domain relative to a given affine map has been 
defined, and the interrelation of these concepts and 
operations has been expressed by the Chain Rule (5). 


Exercises 1 (a) Write the matrix of coefficients of each of the 8 affine 
maps in Exercise 1, §4.1, labeling them Ma, M,,..., 
Ma. 

(b) Make a complete list of all products which can be 
formed from pairs of these matrices (e.g. M,M, is 
meaningful, M.M, is not). 

(c) Compute all the products in (b). 

(d) Find all exterior powers of the matrices Ma, Mp, . . 
Mh. 

(e) Verify the chain rule (6) in all cases listed under (b). 


. 9 


2 Computation of determinants. The determinant of a square 
matrix is denoted by writing the matrix between straight lines 
rather than curved ones, e.g. 


3 4 7 3 4 7 
2 1 —2}]} and |2 1 —2 
6 2 —3 6 2 -3 
denote a 3 X 3 matrix and a number respectively. The prob- 


lem at hand is to develop methods of computing numbers 
which are given as determinants. In Chapter 1 the explicit 


formulas 
s, s, = ab’ — a'b 
and 
a b 
a bc} = abc” + a’b"c + abe’ — cb'a” — c'ba — cba’ 


q"! þ" c" 
were found. Thus for example the determinant above is 


3-1- (3) + 2:2:7 +6:4: (2) — 7:1:6 — (2:2:3 — (—3):4:2 
= —9 + 28 — 48 — 42 + 12 + 24 = —35. 


The analogous formula for n X n matrices contains n! terms 


Chanter4 | LinearAlgebra 104 


and is therefore out of the question computationally for large 
n. A better method of computing large determinants is to 
prove the following rules: 


(a) Interchanging any two rows or any two columns of a 
square matrix changes the sign of its determinant. 

(b) Adding any multiple of one row to another row or 
adding any multiple of one column to another column 
leaves the determinant unchanged. 

(c) Multiplying all entries of one row or all entries of one 


*Thus there is a very important column by a number c multiplies the determinant* by c. 
distinction between multipiying (d) If the first row of a matrix is 1, 0, 0,..., 0 then its 
a matrix by € and mullipiying a determinant is the determinant of the (n — 1) X 
determinant by c. To multiply a a. 
matrix by c means to multiply all (n — 1) matrix obtained by striking out the first row 
entries by c (see §4.5), which and first column of the matrix. The same is true if the 
multiplies its determinant by œ. first column is 1, 0, 0,..., 0. 


Prove these rules. [The new matrices in (a)-(c) can be 
written as products of the given matrix with certain simple 
matrices whose determinants can be computed directly from 
the definition. The result then follows from the Chain Rule. 
The rule (d) follows directly from the definition if the first row 
and the first column are of the form 1, 0, 0,..., 0. Then use 
(b).] Using these rules the determinant above can be found by 


3 4 7 —5 0 15 —5 0 15 
2 1 —2| = 2 1 -2| = 2 1 —2 
6 2 -3 6 2 —3 2 0 1 
2 1 —2 1 2 -2 
= —|—5 0 15; = j0 —5 15 
2 0 1 0 2 1 
—5 15} {1 —3)_ {1 —3 
-a iek a= 
= (—5)(7) = —35. 
Compute the following determinants: 
3 0 1 3 —i 5 
1 2 5|, |—l1 2 1 
—1 4 2 —2 4 3 
—1 12 0} |0 1 0 0 
03 2 1,01 0 1 0 
041 2 0 1 0 1 
3 15 7 001 0 
8 4 7 3 
2 4 -5 -7 
1 3 2 9 
—6 —l1 2 5 


[The last one is —2242.] 


44 | Thelmplicit Function Theorem for Affine Maps 105 


The Implicit Function 
Theorem for Affine Maps 


*The theorem asserts that the relations 

(1) imply relationships of the form (2) 
without giving (2) explicitly, hence the 
name. 


3 Decomposition of a matrix. In Exercise 8, §1.3, it was 
shown that every affine map from a plane to a plane can be 
written as a composition of rotations (x, y) — (y, — x) of 90°, 
reflections x — —x, shears, scale factors (including 0) in 
coordinate directions, and translations. Prove this again and 
generalize to n dimensions as follows: Describe the matrices 
of coefficients of affine maps of n-space to n-space of the above 
simple types (rotations, reflections, shears, etc.). Show that it 
suffices to show that every n X n matrix can be written as a 
product of such matrices. Prove that by multiplying an 
n X n matrix on the left and/or right by such matrices it 
can be reduced to a matrix whose first row and column are 
1,0, 0,...,0 (unless the matrix is identically zero). Then use 
induction on n. 


4 Show that the determinant of the transpose of a square 
matrix is equal to the determinant of the matrix itself. [Use 
Exercise 3.] 


4.4 

The conclusions of §4.1 (pp. 83-84) concerning the re- 

sults of step-by-step elimination can be summarized as 

follows: 

Implicit Function Theorem* 

A system of equations 

(1) Yi = >, aix; + const. (i= 1,2,...,m) 
j=1 


is equivalent to a system of the form 


(2a) xi = Avy; + DS Bix; + const. 
j=1 j=r+l1 
(2) (i= 1,2,...,7r) 
(2b) Yi = > Cri; + const. 
j=l 


G=rt+l,...,m) 


if and only if the affine mapping defined by (1) has the 
properties that 


n~ ôi,- - -s Yr) 
——_———— ¥ (Qand 
0) O(X1, +--+ 5 Xr) nen 
(ii) the pullback of every k-form in the y’s is zero for 
k >r. 


As was pointed out in §4.1, equations (2) constitute a 


Chapter4 | Linear Algebra 106 


solution of the problem “given yı, yo,..., Ym find all 
X1, X2,..., Xn Satisfying (1).” If y1, yo, ..., Ym do not 
Satisfy (2b), then there are no such x1, X2,..., Xn; if 
Yis Y2,-++5 Yn do satisfy (2b), then there are such xı, 
X2, ..., Xn and all of these can be found by choosing 
Xrtis Xr+2,..., Xn arbitrarily and using (2a) to de- 
termine X1, X9,..., Xr. 


Proof 


Assume first that relations of the form (2) are given 
which are equivalent to (1). It is to be shown that then (i) 
and (ii) must be satisfied. Fixing values of x44, Xr4+2, 

-s Xn in the first r equations of (1) and in (2a) gives 
equivalent relations of the form 


yi = >> ayx; + const., x; = D> Ay; + const. 

j=l j=1 
The equivalence of these relations implies that if the first 
set of equations is used to define (yj, yo,..., Yr) given a 
set of values (x1, X2,...,X,), and if these values of 
(Vi, ¥2,--+5)r) are substituted into the second set of 
equations, they will yield the original values of (xı, 
X2,...,X,). In short, the composition of these two maps 
(x) — (y) —> (x) is the identity map. Thus by the Chain 
Rule 


0(x4, X25 > +5 Xr) (yı, Y2.. -s Yr) — ð(xı, X23.. -> Xr) = | 
ô(yı, V2o++5 , Yr) ô(xı, XQ, 2265 Xr) 0(X1, XQo +25 Xr) 


where the Jacobians are the Jacobians of the two affine 
maps above and of the identity map. Thus 


O11; Va, +++ 5 Yr) 
0(X1, XQ, +225 Xr) 


*ab = 1 implies a 0. cannot bezero* and (i)is proved. To prove (i1), note that 
by (2b) the map (1) can be written as a composition 
(xı, X2. -s Xn) > (V1, Y2, see , Yr) > Q1, V2, s. , Ym) 
where the first map is the first r equations of (1) and 
where the second map is (2b) together with the identity 
map (Y1, <- ., Yr) > (Y1, - - - , Yr). By the Chain Rule the 
pullback under the composed map—i.e. the pullback 
under (1)—is the pullback of the pullback. Since any 
(r + 1)-form in the r variables (y1, yo,..., Yr) iS zero, 
(11) follows. 

Now assume that (i) and (ii) are satisfied. It 1s to be 


4.4 | The Implicit Function Theorem for Affine Maps 107 


shown that then there exist relations of the form (2) 
equivalent to (1). This will be done by showing that such 
relations (2) can be derived by step-by-step elimination. 

In order to avoid difficulties arising from the need to 
rearrange x’s or y’s during the elimination process, it is 
useful to rearrange the equations at the outset as follows: 
Since the pullback of dy, dy2....dy, has a non-zero 
term which involves none of the factors dx,41, dXr49,.--; 


BY) ae dxa... dxe) , 
(xı, e.. Xr) 

the pullback of dy; dy2...dy,_, must also have a non- 
zero term which involves none of the factors dx,.1, 
dX;r49,...+, dXn; that is, it must have a non-zero term in 
dxodx3...dx, or dx;dx3...dx, or ... or dx, dxo... 
dx,_ ,. Therefore by rearranging the first r of the x’s it 
can be assumed that 


dX (namely the term 


(1, Y2s... , Yr—1) 
OW 1) Vor- Yra) oe 0, 
O(X1, X2s. -s Xr—1) 


In the same way, by rearranging the first r — 1 of the x’s 
it can be assumed that 


In the same way it follows that the first r — 2 of the x’s 
can be rearranged (if necessary) so that the Jacobian of 
Vis V2,+++5Yr—3 With respect to x1, X2,...,Xr—3 1S 
non-zero, then the Jacobian of yi, yo,..., ¥r—4 With 
respect to x3, Xo,...,X,—4, and so forth. That is, by 
rearranging the first r of the x’s (if necessary) it can be 
assumed that the given system (1) satisfies the stronger 
assumption 


{ Wi V2, ++ +» Yk) — 

(i’) Ix xa.. Xe)” © for K=1,2,...,7. 
It will be shown that if (i’) and (ii) are satisfied then 
relations of the form (2) can be found by step-by-step 
elimination (without rearrangement). 


. Oy1. . , 
Since a11 = m is not zero (by (i’)) the first equation 
Xı 


of (1) can be solved for x; in terms of y1, X2, X3,..-5 Xn 
and substituted into the remaining equations to give 
Y2, Y3, - - - , Ym in terms of y1, Xo, X3,..., Xn- This is the 
first step of the elimination process. Suppose that k steps 


Chapter4 | Linear Algebra 


001; V25-- 
O(X1, X2,.. 


108 


of the process have been carried out to put the equations 
in the form 


n 


k 
Xi = 5 Q@ijYj + 5D @i;X; + const. 
j=1 


j=k+1 
(3) x . (i= 1,2,...,k) 
Yi = 5 Ciy; + 5 D;;x; + const. 
j=1 j=k 


+1 
G=kK+1,k+2,...,m). 


The next step of the process requires that the equation 
for yz41 be solved for x,41, which is possible if and only 
if De+ikt1 * 0. 

Fixing values of x42, Xk+3, <., Xn and considering 
the first k + 1 equations of (1) as functions of xı, 
X2,..+,Xz41, this map (x) — (y) can be written as a com- 
position (x1, X2, -< <, X41) > Wi Vas ++ +s Yks Xk+1) 
(Vis ¥2,--+»Ve+1) Where the first map is the first k 
equations of (1) together with x,41 = x,4, and where 
the second map is y; = y: (i = 1,2,..., k) together 
with the (k + 1)st equation of (3). Using the Chain Rule, 


. , Vk+1) — 001, V2s. -s Yks Yk+1) , 001, Y2s ->s Yks Xk+1) 
E) Xk+1) ô (yı, V25+++3Vks Xk+1) 0(x1, X99 e e 3 Xks Xk+1) 


001, V25-+5 > Vu) . 


= D 
E+Lk+I 0(%1, XQ, 2205 Xx) 


Thus (i) implies Op41.441 Æ 0 for k = 1, 2, 3,..., 
r — 1, and the process continues until k = r. At this 
point D,13,741 = 0 by the above argument (using (ii)). 
In fact, this conclusion holds even if X,4.4, Xr42,-++5Xn 
OF Vr41, Yr+2s -+ -s Ym are rearranged. Hence all of the 
©’s must be zero—that is, the equations (3) must have 
the desired form (2)—and the Implicit Function Theorem 
follows. 


This existence proof (if (i) and (ii) then there exists (2)) 
is constructive in the sense that it tells exactly how to 
construct the relations (2) in a finite number of steps— 
first arrange the equations (1) as indicated and then 
perform step-by-step elimination. However, this finite 
number is immense, even for relatively small systems of 
equations (1), and the process prescribed by the proof is 
actually wholly impractical in most cases. Even with the 
aid of a computing machine this method of solution is 
usually inadvisable because the amount of arithmetic is 
so great as to make the error due to roundoff intolerably 


4.4 | The Implicit Function Theorem for Affine Maps 109 


(4) 


large. Therefore the proof by elimination is constructive 
only in a very theoretical sense, and no really practical 
method of constructing a solution has in fact been given 
here. More practical methods are discussed in Chapter 7. 

In theoretical work it is often useful to have a formula 
which expresses the solution in closed form. Specifically, 
the Implicit Function Theorem implies that the co- 
efficients a;; determine the coefficients A,;, Bij, Ciz; a 
solution in closed form is a formula expressing the A’s, 
B’s, and C’s as functions of the a’s. Although the result, 
known as Cramer’s Rule or the formula for the inverse of 
a matrix, is of no practical significance and will not be 
used in the remainder of this book, its proof is included 
here to demonstrate the usefulness of forms in linear 
algebra. 


Solution in Closed Form 


Suppose that the systems of equations 


(1) Yi = » dijXj + const. (i = l, 2, s.s m) 
j=1 


and 
(2a) Xi = > AiiYyi + > B;;x; + const. 
j=l j=r+1 
(2) (i= 1,2,...,7) 
(2b) Yi = 5 C;;y; + const. 
j=1 


(i=r+1,r+2,...,m) 


are equivalent; that is, suppose that a set of n + m 
numbers (X1, X2,- -< Xns Yis Y2 - - -s Ym) satisfies (1) if 
and only if it satisfies (2). The problem is to express the 
numbers A, B, C in terms of the numbers a. The method 
will be to apply the Chain Rule to a suitably chosen 
composite map. Consider the map defined by 


x; = >) Aiju; + 5 Biju; + const. 
j=l j=r+l1 
(i= 1,2,...,7r) 
Xi = Ui (i=r+l1l,r+2,...,7) 
Yi = Ui (i= 1,2,...,r) 


Yi = $ Ciu; + const. G=rt+i,r+2,...,m). 
j=1 


Chapter4 | Linear Algebra 


(—1)—} dx; pa 
1 


110 


where the constants are as in (2). The points of the image 
of this map satisfy the relations (2), so the assumption 
that (1) and (2) are equivalent implies that all points in 
the image satisfy the relations (1). Hence the y-coordinates 
of such a point can be obtained from its x-coordinates by 
using (1). In other words (4) can be written as the 
composition of the maps 


n 


T 
Xi = 5 Aiju; + 5 B;;u; + const. 
j=1 j 


=r+1 
(9) (i= 1,2,...,7r) 
Xi = Ui (i=r+1l,r+2,...,n) 
and 
Xi = Xi (i= 1,2,...,n) 
(6) 


n 
y= 5 a;;X; + const. (i= 1,2,...,m) 
j=l 


where the constants are as in (1). The pullback of the 
n-form dy; dyo . . . dyr dXr41 dX;42...dX, under (4) is 
du, dug... dun, whereas if just one of the factors in this 
n-form is changed then the pullback is du, dug... du, 
multiplied by one of the numbers A, B, C to be found. 
The first two columns of the table list the pullbacks of 
such n-forms under (4). On the other hand, the pullbacks 
of these n-forms under (6) are found by replacing the 
dy’s by their expressions dy; = }_a;; dx; and multiplying 
out to obtain a function of the a’s times dx; dxo... dXn. 
The resulting function of the a’s can be expressed as a 
Jacobian of the y’s with respect to the x’s. For example, 
the pullback of the form in the second row of the table is 
found by moving the new factor dx; to the front, which 
involves (j — 1) interchanges and therefore a factor 
(—1)—!, by finding the pullback of the remaining 
(r — 1)-form in the dy’s, by ignoring all terms except the 
term which contains none of the factors dx; dx,41 dXr+2 
...aX,, and finally by moving dx; to its natural position, 
which involves i — 1 interchanges and therefore a factor 
of (—1)*7!. The result is that the pullback under (6) of 
the n-form in the second row is 


4.4 | The Implicit Function Theorem for Affine Maps 111 


The pullback of 


under (4) is and under (6) 1s 


du, duz ...dUn ax, dx2...dx, times 
times 
ôi Ya) - -> Vr) 
O(X1, X25... Xr) 


P É ” on On? O An 5a 


dyi.. 


dyi.. 


dyi.. 


ax; 

OG... Ar dXr41-. 
isrjsr 

dx; 

dy, dXr41 -ÈQ 
i Sr, jrr 

dy; 

BG... Mr dXr41.. 


i>r, jar 


.dXn 


. dXn 


.dXn 


it IOn -e Rees » Yr) 
7 _1)\tt) 1 
Ai; CD aea. „Xa ey Xr) 
B _ (Yi, Y2, oe cee ee 3 Vr) 

1J O(X1,--- 5s -A , Xr) 

Xj 
yi 

Ci; Ii + + KH ++ + Yr) 


O(X1, Xo,.- 20645 Xr) 


This result is given in the third column of the table. The 
corresponding results for the other 3 rows, also given in 
the table, are found in the same way. 

Now by the Chain Rule the pullback of the first column 
under (4) can also be written as the pullback under (5) 
of the pullback under (6). Since the pullback of dx, dx 
0(x1, wy Xn) 


times 
O(u1, cy Un) 


...@xX, under (5) is a constant 


du; dus... du, it follows that 


second column = const. : third column; 


that is, the two columns are proportional, which means 
that the ratios of corresponding entries are equal. Thus 
A,; is equal to the ratio of the last entry of the second 
row of the table to the last entry of the first row, which is 
a function of the a’s. In the same way B;;, C;; can be 
expressed as ratios of entries in the third column and 
hence as functions of the a’s as desired. 

For example, if m = n = r so that M = (a;,) is an 
n X n matrix with non-zero determinant, then the co- 


e Ox; . ° e e 
efficient A;; = ay, of the inverse function (2a) is given by 
Yi 


Aij _ (—1)**? det(M;;) 
1 det(M) 


Aij = 


Chapter4 | Linear Algebra 


112 


where M;; is the (n — 1) X (n — 1) minor of M ob- 
tained by striking out the column corresponding to x; 
and the row corresponding to y,. This is the formula for 
the inverse of a matrix. 


Exercises 


1 Invert the equations 
u = x + 2y + 22 
U Sx + y+3z 
w= -2x+2y+ z 


by (a) step-by-step elimination and by (b) using the formula 
for the inverse of a matrix. 


2 Suppose that the system (1) is given and that the A’s, B’s, 
and C’s of (2) are known; how can the constants in (2) be 
determined? For example, extending Exercise 1, invert the 
equations 
u = x+2y+2z+7 
5x + yt 3z-2 
—2x+2y4+ z+1. 


3 (a) A mapping of the form 


V 


W 


u = a11Xx + ai2y + a13z + bı 
v = a21X + a22y + a23z + be 
cannot possibly be one-to-one. How could you find 
two points which have the same image point? 
(b) A mapping 
u = ayyx + ai2y + bi 
v = a21X + a22y + be 
a3ıx + a32y + b3 


wW 


cannot be onto. How would you find a point (u, v, w) 
which is not the image of any point (x, y)? 


4 Show that an affine map of n-space to m-space cannot be 
one-to-one if n > m and cannot be onto if n < m. This 
implies the ‘geometrically obvious’ statement that such a map 
can be one-to-one and onto only if n = m. 


5 Proposition. A matrix whose entries are all integers has an 
inverse whose entries are all integers if and only if its de- 
terminant is +1. 


(a) Which half of this proposition is an immediate con- 
sequence of the Chain Rule? 

(b) Prove the other half using the formula for the inverse 
of a matrix. 


4.5 | Abstract Vector Spaces 113 


Abstract Vector Spaces 


*'Vector’ is the Latin word for 
‘carrier’. Originally it referred to a 
flow (convection) represented by an 
arrow. It then came to mean any 
quantity (velocity, force) represented 
by an arrow, i.e., any quantity with 
magnitude and direction. Finally it 
came to refer to quantities which 
could be added and multiplied by 
numbers. 


the letter R denotes the set of real 
numbers. 


4.5 


The definitions and theorems of this section will not be 
used to any appreciable extent in the remainder of the 
book, and readers who are primarily interested in calculus 
may prefer to skip this section entirely. The theory of 
vector spaces provides essentially a new vocabulary for 
formulating the basic facts about the solution of linear 
equations. Once one becomes accustomed to the new 
terms, this vocabulary is natural, simple, and useful. It is 
used in virtually all branches of mathematics. 

A vector* space is a set in which any two elements can 
be added and any element can be multiplied by a number. 
The set of all kK-forms on n-space is an excellent example 
of a vector space and is the principal example studied in 
this book. To write a 2-form on 3-space as A dy dz + 
B dz dx + C dx dy means that A (a number) times dy dz 
(a 2-form on 3-space—namely, oriented area of the 
projection on the yz-plane) is another 2-form A dy dz; 
that similarly B dz dx, C dx dy are 2-forms; and that the 
sum of these 2-forms is again a 2-form. More generally, 
the sum of two k-forms and a number times a k-form are 
defined as in §4.2. 

Another natural example of a vector space is the set of 
all functions from any set S to the real numbers R,} with 
the operations of addition and multiplication by numbers 
defined in the obvious way. For example, let S be the 
interval {0 < x < 1} and let V be the set of all real- 
valued functions defined on this interval. If fand g are 
elements of V, that is, if fand g are real-valued functions 
defined on {0 < x < 1}, then f + g is again an element 
of V; namely, f + g is the function which assigns to the 
point x of {0 < x < 1} the value f(x) + g(x). This 
operation is so natural that one writes F(x) = 3x? + 
2x + 1 without stopping to point out that one is adding 
the functions 3x*, 2x, 1. Similarly 3x? is the number 3 
times the function x°; and, in general, if a is a number 
and f an element of V, then a: f denotes the element 
which assigns to each x in the interval {0 < x < 1} the 
value a : f(x). These definitions of f+ g and a: f have 
nothing to do with the fact that S is {0 < x < 1} and 
serve to make V = {functions S—R} into a vector 
space for any set S. 

An important special case is the case in which S is the 
finite set consisting of the first n integers S = {1,2,..., 
n\. For the sake of definiteness, let S = {1, 2,3}. A 
real-valued function f: S — R is a rule which assigns a 


Chapter4 | Linear Algebra 114 


*Another example of a subspace Is 
the following: Intuitively, k-forms on 
n-space are functions assigning 
numbers to oriented k-dimensional 
surfaces in n-space (e.g. n = 3, 

k = 2). Sums and multiplies of 
k-forms are k-forms ; hence the 
vector space of k-forms on n-space 
is a subspace of the vector space of 
all functions assigning numbers to 
oriented k-dimensional surfaces in 
n-space. 


real number f(1), f(2), f(3) to each of the three elements 
of S. It is customary to write fı instead of f(1), to write 
fa, fz instead of f(2), f(3), and to describe the function f 
by listing its three values f= (fi, fo, f3). Thus, the list 
(7, —4, 2) represents the function which assigns the value 
7 to 1, the value —4 to 2, and the value 2 to 3. In this 
way functions from the set S = {1,2,3} to R are 
represented simply by triples of real numbers. This set 
will be denoted by V3 = {all functions from the set 
{1, 2, 3} to R}. The sum of the function f = (7, —4, 2) 
and the function g = (3, 2,1) is the function which 
assigns to 1 the value 7 + 3 to 2 the value —4 + 2 and 
to 3 the value 2 + I, 1.e. 


(7, —4, 2) + (3, 2, 1) = (10, —2, 3). 


In short, elements of V3 are added componentwise. 
Similarly, 2 times f = (7, —4, 2) is (14, —8, 4); that is, 
multiplication of elements of V3 by numbers is carried 
out in the obvious way. 

A subspace of a vector space is a subset with the 
property that sums of elements in the subset are again in 
the subset and multiples of elements of the subset are 
again in the subset. For example, the vector space of 
continuous functions on {0 < x < 1} is a subspace of 
the space of all functions on {0 < x < 1} considered 
above; that is, the sum of two continuous functions is a 
continuous function and any multiple of a continuous 
function is a continuous function. The vector space of all 
polynomial functions on {0 < x < 1} is a subspace of 
the vector space of continuous functions, and a fortiori 
of the vector space of all functions on {0 < x < 1}, 
because polynomials are continuous functions and sums 
and multiples of polynomials are polynomials. The space 
of all polynomials of degree <5, for example, is in turn 
a subspace of the vector space of all polynomials because 
sums and multiples of polynomials of degree at most five 
have degree at most five.* (Note that the same is not true 
of the set of polynomials of degree exactly five because 
(3x5 + x? — 1) + (—3x5 + xt — 2x + 1) = xf + 
x? — 2x.) An example of a subspace of the space V5 
(= real-valued functions on the set {1, 2, 3, 4, 5}, repre- 
sented as quintuples of numbers) is the space of functions 
the sum of whose values is 0, i.e. all (f1, fo, fs, fa, fs) 
such that fi +f + fs + fa + fs = 0. Another ex- 


ample is the space of all symmetric functions, i.e. all 


(fis fos fas fas fs) such that fi = fs, fo = fa The inter- 


4.5 | Abstract Vector Spaces 


115 


section of these two subspaces of V; is again a subspace, 


namely the subspace of all (f1, fo, f3, fa, fs) satisfying 
the three relations 


fitfetfs+fat+fs = 0 
(1) fi=fs 
fo = fa 


One immediately verifies that if f, g are elements of V, 
satisfying any one of these relations, then f + g and 
a : f satisfy the same relation. 

The most important concepts relating to vector spaces 
are those of basis and dimension. It was shown in §4.2 
that every 2-form on xyzt-space can be written in exactly 
one way in the form 


A dx dy + B dx dz + C dxdt + D dy dz + E dy dt + F dz dt. 


This is what it means to say that the 2-forms dx dy, dx dz, 
dx dt, dy dz, dy dt, dz dt are a basis of the space of 2-forms 
on xyzt-space. The space of 2-forms on 4-space is said to 
be six-dimensional, because 2-forms are uniquely de- 
scribed by the 6 numbers (A, B, C, D, E, F). The space 
V3 is three-dimensional since elements of V3 are uniquely 
described by their three values fj, fo, f3; if one takes 
51, 52, 63 to be the elements (1, 0, 0), (0, 1, 0), (0, 0, 1) 
of V3—that is, the three functions on S = {1,2,3} 
which are 1 on one element of S and 0 on the others— 
then every element of V3 can be written in exactly one 
way in the form 


fi 61 + fo 62 + fz 63 = fiCl, 0, 0) + f2(0, 1,0) + f3(0, 0, 1) 


= (fi, fo, f3) 
and ôi, d9, 63 are a basis of V3. 


In general, a basis of a vector space is a set of elements 
U1, D2, . . . , Un Of the space, with the property that every 
element of the space can be written in exactly one way as 
a combination 


(2) X10 1 + XW ttt + XnVy 


where x1, Xg,..., Xn are numbers. When a vector space 
has a basis consisting of n elements it is said to be 
n-dimensional. The vector space of k-forms on n-space is 


(;,) dimensional and a basis is given by the basic k-forms 


Chapter4 | Linear Algebra 


116 


in lexicographic order. The vector space V, of real- 
valued functions on S = {1,2,...,n} is n-dimensional 
and a basis is given by the ‘é-functions’ 6;, 59,..., Ôn 
where 6; is the function whose value is 1 on k and 0 on 
all other integers 1, 2,...,n. The space of all functions 
on {0 < x < 1} is infinite-dimensional (i.e. not n-dimen- 
sional for any integer n) because no finite number of 
functions can possibly constitute a basis. In fact, the space 
of all continuous functions and even the smaller space of 
all polynomial functions on {0 < x < 1} are both 
infinite-dimensional; for the latter space one has an 
infinite ‘basis’ consisting of the functions 1, x, x”, x°,.... 
The space of all polynomial functions of degree <5 on 
{0 < x < 1} is, however, six-dimensional and a basis 
is given by the functions 1, x, x”, x°, x*, x°. The subspace 
of V defined by the relations (1) is 2-dimensional and a 
basis is given by the vectors (1,1, —4, 1,1) and 
(1, —1,0, —1, 1). The subspace consisting of all sym- 
metric functions on {1, 2,3, 4,5} is three-dimensional 
and a basis is given by the two vectors above together 
with (0, 0, 1, 0,0). Another basis for the space of sym- 
metric functions is given by the three vectors (1, 0, 0, 0, 1), 
(0, 1, 0, 1, 0), (0, 0, 1, 0, 0). 

The subspace of V5 which consists of all functions the 
sum of whose values is zero is 4-dimensional; there are 
many natural choices of a basis for this space, one of 
them being the set (1, —1, 0, 0,0), (0, 1, —1, 0, 0), 
(0, 0, 1, —1, 0), (0, 0,0, 1, —1). To prove that this is a 
basis it must be shown that every vector (f1, fo, f3s fás fs) 
satisfying fı + f2 + fa + fa + fs = 0 can be written 
in just one way as a sum 


(fis fo, fas fas fs) = xı(l, —I, 0, 0, 0) + x2(0, I, —I, 0, 0) 


+ x3(0, 0, l, —l, 0) + x4(0, 0, 0, l, — 1) 
= (X1, X2 — X1, X3 — X2, X4 — X3, — X4). 


That is to say, the equations 


f= x 
fo = X2 — x 
(3) f3 = X3 — Xe 
fa = X4 — X3 
fs = —X4 


have one and only one solution (x1, X2, X3, x4) for each 


4.5 | Abstract VectorSpaces 117 


*Because a line is described by an 
equation Ax + By = const., a 
polynomial of the form Ax + By is 
called a linear form. More generally, 
a polynomial of any number of 
variables Aix} + Á2X2 +°+°+>+AnXn 
in which all terms are of degree one 
is Called a linear form. For this 
reason, any mapping which can be 
expressed by linear forms ts called a 
linear mapping. It will be seen in 
Exercise 9 that the abstract definition 
given here amounts to saying that 
the mapping can be expressed by 
linear forms. 


+The term ‘linear operator’ or 
‘operator’ is also used in certain 
contexts, principally when the range 
and domain of the map are the same 
vector space. 


(fis fos fa, fa, fs) provided $ f; = 0. This follows from 
the explicit solution 


x= fi 
X2e= fit fe 

(4) X3= fitfet fs 
Xe= fit fot fa + fa 


fs = —fi — fo — fa — fa 


A linear* mapt f: V — W from a vector space V to a 
vector space W is a mapping which preserves the opera- 
tions of addition and multiplication by numbers. That 
is, f carries vı + Ve (the sum of two elements of V) to 
f(v) + f(v2) (the sum of their images in W) and av 
(multiple of an element of V ) to af(v) (the same multiple 
of its image in W). In other words, a linear map f: V > W 
is a function which assigns an element f(v) of W to each 
element v of V in such a way that 


g) fer + v2) = fe) tfo) (eny v1, v2 in V) 
f(av) = af(v) (any vin V, ain R). 

Pullback maps are linear; that is, given an affine map, 
the pullback map carrying the vector space of k-forms on 
the range to the vector space of k-forms on the domain is 
a linear map. This means that in finding the pullback of, 
say, A dy dz + Bdzdx + C dx dy one can first find the 
pullbacks of the basic forms dy dz, dz dx, dx dy and then 
multiply and add. This fact was used in §4.3 when the 
pullback map was described by the matrix M™ giving 
the pullbacks of the basic k-forms. 

If V is the vector space of functions {S — R} and W 
the vector space of functions {T — R}, and if F is a 
function from S to T, then the composition of elements 
of W with F defines a linear map from W to V called the 
pullback of elements of W under the map F. For example, 
if W is the vector space of real-valued functions on the 
xy-plane, if V is the vector space of functions on the 
interval {0 <¢ < 1}, and if F(t) = (x(t), y(t) is a 
function from the interval {0 < t < 1} to the xy-plane 
then the composed function f(x(t), y(t)) assigns an 
element of V to each element f of W. The fact that this 
map of W to V is linear is immediate from the definitions. 
Intuitively the pullback of k-forms is such a pullback— 
the set T being oriented k-dimensional surfaces in the 


Chapter4 | Linear Algebra 


118 


range of an affine map, the set S being oriented k- 
dimensional surfaces in the domain, and the map F being 
the rule which assigns to each element of S its image 
under the given affine map. (The same holds for pull- 
backs of non-constant forms under non-affine maps.) 
The map ‘derivative’ assigning functions to functions 
is a linear map, as is expressed by the familiar identities 


(Fray=ffrs, (Y= af. 


Similarly the map ‘integral from a to b’ assigning num- 
bers to l-forms on the interval {a < x < b} is linear 
because 


b b b 
| [f(x) + g(x)] dx = J f(x)dx + | g(x) dx 


b b 
| const. f(x) dx = const. | f(x) dx. 


The same is true of the generalizations 
d: {k-forms} — {(k + 1)-forms} 


and 


| : {k-forms} — {numbers} 
D 


discussed in Chapters 2, 3 and 6. Generally speaking, 
any natural mapping whose range and domain are vector 
spaces will preserve addition and multiplication by 
numbers, that is, it will be linear. 

It is useful to rephrase the definition of ‘basis’ in terms 
of linear mappings. Given a basis {01,UV9,...,U,} Of a 
vector space V, the mapping 


(6) f(X1, X2.. -s Xn) = X11 + XW H't + XnUn 


carries the set of n-tuples of numbers to V. Regarding the 
n-tuple (x1, X2,..., Xn) as an element of V,, this map is 
a linear map V, — V because 


(xy + xW bee + (Xn H X Wn = (X101 Ht F Xanta) + (x101 +++ + Xna), 
(ax1)01 + (axgWg + +° + (AXnWn = A(x, +:°°° + X_Vn), 


by the usual rules of distributivity, associativity, and 
commutativity. By the definition of ‘basis’ this linear map 
f: Va — Vis one-to-one and onto—that is, every element 


4.5 | Abstract VectorSpaces 119 


vof V is the image of exactly one n-tuple (x1, X2,..., Xn). 
In other words, the map f: V,—V has an inverse 
f: V — V, and, since the statement ‘f(x1) = yı, 
f (x2) = yoimplies f(x1 + x2) = yı + y2 and f(axı) = 
ay,’ is identical to the statement ‘xı = f—'(y,), 
x2 = f—‘(y2) implies xı + x2 = f~'(1 + y2) and 
ax, = f—‘(ay,), the inverse is linear. In short, if 
{U1,V9,...,Un} is a basis of the vector space V then the 
equation (6) defines an invertible linear map Vn © V. 
Conversely, if fis any invertible linear map V, +> V then 
the set 


vi = f(1,0,0,...,0) 
va = (0, 1,0,..., 0) 
v, = f(0,0,0,...,1) 


is a basis of V because every element of v can, by the 
invertibility of f, be written in exactly one way in the 
form 


v = f(X1, X23... , Xn) 
= f(xı,0,0,...,0) + f0, x2,0,...,0) +--+ + f(0,0,0,..., Xn) 
= XV, + XW + tt + XpVn. 


f(x101 


Therefore the two notions are equivalent, that 1s, a basis 
determines an invertible linear map V,, <> V and such a 
map determines a basis. 

The basic facts about the solution of linear equations 
can be summarized by the following theorem which is the 
real raison d’être of the theory of vector spaces. 


Theorem 


Canonical form for linear maps. Given a linear mapping 
f: V— W of an n-dimensional vector space V to an 
m-dimensional vector space W, it is possible to choose 


bases 1, U2, . . . , Un Of Vand wy, Wo,..., Wm Of W such 
that 
+ XWo + °° + XVn) = X1Wi + XoWo + °° + XW 


for some integer r. This integer r depends only on the 
map f: V — W and not on the choice of the bases vı, 
U2, ..., Un; W1, Wo, ..., Wm. It is called the rank of the 
map f. By its definition, r < min (n, m). 


Chapter4 | Linear Algebra 


120 


Corollary 


If n > m then f cannot be one-to-one (r < m < n and 
f(r+1) = f(20r41), Whereas v,41 ¥ 2v,,1), and if 
n < m then f cannot be onto (r < n < m, and w,41 is 
not in the image of f). Thus if fis one-to-one and onto, 
then n = m. Taking f: V — V to be the identity map, 
this implies that if V is n-dimensional (has a basis of n 
elements) and m-dimensional (has a basis of m elements) 
then n = m. In short, the dimension of a finite-dimen- 
sional vector space is well-defined. 


Proof 


To say that V is n-dimensional means that there is a basis 
of n elements vi, V2,...,U,, and to say that W is m- 
dimensional means that there is a basis of m elements 
W1, W2, ..., Wm. The method of proof is to start with 
such bases and to use a process of step-by-step elimina- 
tion to replace them with new bases having the desired 
property. 
First define numbers a;; by 


fe) = Zaw (G=1,2...,n) 
j=1 


(every element of W can be expressed in just one way as 
a combination of the w’s). Unless all of the a’s are zero 
(in which case r = 0 and these bases have the desired 
form) the bases can be rearranged to make a,, # 0. Set 
wi = fx) = a11W1 + G1 2We + °° + AimWm. Then 
Wi, W2, W3,--+ 5 Wm Can be shown to be a basis of W as 


l 
follows: Since wy = — [wi — @19W2 — °** — GimWm] 

Qi1 
can be expressed in terms of wi, Wo,..., Wn, it follows 
that every element of W can be expressed in terms of 
Wi, W2, ..., Wy. This expression is unique because 


XW F XQWe bes F XmWm = yiwi + Yawo + * °° + YmWm 


implies 


X1Q11W1 + (x2 + X1412)We + °° = Y1G11W1 + Yo + Y1412)We + °°", 


which implies X1@11 = 1411, Xi + X1@15 = Yi + Yiaii 
(i > 1), which in turn implies xı = yı (a1, ¥ 0), and, 
finally, x; = y; (i > 1). Therefore it can be assumed at 
the outset that wi, We,..., Wm was the chosen basis of 


4.5 | Abstract Vector Spaces 


121 


W, i.e. that f(v1) = wy. If f(v) for i > 1 contains a 
non-zero term in w; for j > 1 then the process can be 
repeated: By rearrangement it can be assumed that 
doo ~ 0. If w3 is defined by w3 = f(ve) then the above 
argument shows that w1, W3, W3, . . . , Wm is a basis of W. 
Therefore it can be assumed at the outset that the chosen 
basis of W satisfies f(v1) = wi, f(¥2) = We. Continuing 
in this way gives a new basis W1, Wo,..., Wm of W such 
that f(¥1) = w1, f(V2) = We,..., f(r) = Wr, and such 
that f(v,) for i >r contains no non-zero terms in 
Wris Wr+25+++ 5 Wm. Assume this has been done and let 


r 
fo, = > AijWj for i > r. Set U; = Ui — > QjjV; for 
j=l j=l 
i> r. Then vj, Vo,...,U;, 0,415... Vn can be shown to 
r 
be a basis of V as follows: Since v; = vi + >> a,,v; can 


j=1 
be expressed as a combination of vi, Vo,...,U;, U4 


...,U, (i > r) it follows that every element of V can be 
expressed as a combination of these elements. This ex- 
pression is unique because the assumption 


Xiba Pott Xp Xpp Wear +++ + XnVn 

= YW Hiie H Ve + Veg Wrgr Ft + Ynt 
implies first of all, when v; is rewritten as v; — 
y a,;;v; and the coefficients of v; (i > r) are compared, 
= 


that X,41 = Yr+1,---,Xn = Yn. On the other hand, 


r 


fe) = fe) — È af) = 2 lijWj — X a;jW; = È 0: w;. 


j=l 
Hence when fis applied to the equation above it gives 
XiWy ttt F XrWr = yiwi + °° + yw, 


and therefore x; = yy,...,X- = yy. Thus v4,...,0,, 
Ur41,--.+,U, iS a basis and, since it was just shown that 


f(x Her HF Xp H Xp_Weer $e + Xna) = XW + XpWy, 


this basis has the desired property. It remains only to 
show that the value of r is independent of the choices. 

Suppose that vi, v5,...,v, and Wi, W3, ..., Wn are 
bases of V and W such that 


f(xwi ++ XnUn) = xıwi +.: + XsW5. 


It is to be shown that s = r. Define a linear map V, — V 
by (x1, X2... , Xs) > Xyw, + +++ + xvi and a linear 


Chapter4 | Linear Algebra 122 


*Using the obvious fact that a 
composition of linear maps ts linear. 


map W >V, by yiwi +: + YmWm — (V1; Y2s..., 
yr), and consider the composed map V, > V > W >V, 
with f. The first two maps carry V, onto the image of f 
which the last map carries onto V,. Hence this map 
carries V, onto V,. But the portion of the theorem 
already proved shows that if s < r then V, — V, cannot 
be onto.* Therefore s > r. By the same token r > s and 
the theorem is proved. 

Because most vector spaces which occur naturally are 
vector spaces of functions {f: S— R} in one guise or 
another, the informal definition of vector space given in 
the beginning of this section is adequate for most 
applications. However, it is always advisable to have 
precise definitions. 


Definition 


A vector space is a set V together with two operations, 
addition and multiplication by numbers, satisfying certain 
axioms. The addition operation assigns to each (ordered) 
pair of elements v1, v2 of V a third element vı + v2 of 
V, and multiplication by numbers assigns to each element 
v of V and each number a an element av of V. The 
axioms are: 


I. Commutative law. The addition operation is com- 
mutative, 1.e. 


Uy + Vg = Vo + Vj. 


II. Associative law. Both addition and multiplication 
by numbers are associative, i.e. 


(vı + Ve) + V3 = vı + (V2 + V3) and alag) = (a1a2)v. 


Ill. Distributive law. Multiplication by numbers is 
distributive over both addition of elements of V and 
addition of numbers, i.e. 


a(v; + ve) = avı + avg and (a; + ag = ay + agb. 


*This axiom can also be stated 
“Vy = av + v2 has a unique solution 
v given vı, V2 in V and given a 
number a # 0." The statement given 
in the text emphasizes the relation of 
the axiom to the elimination process. 


IV. Solution of equations*. An equation of the form 
U = A101 + aW + '** + Andy 


in which a; ~ 0 has a unique solution vı in V given 
elements v, Vo, U3, ..., Un Of V and numbers a1, do,..., 
An. 


4.5 | Abstract Vector Spaces 


123 


It is an interesting exercise to show that these axioms 
imply all the expected facts about addition and multi- 
plication by numbers, for example, that they imply: 


V.l-v=v (any vin V). 
VI.v+0:-w=v (any v, win V). 
VII. The solution in IV is given by 
vı = (1/a,)[v — aW — +++ — Agr]. 
Glossary 


The following terms are used in connection with vector 
spaces. 


Vector space. A vector space is a set with operations of 
addition and multiplication by numbers subject to the 
axioms I-IV above. 


Linear map. A linear mapis a function f: V — W whose 
range and domain are vector spaces and which preserves 
the vector space operations, i.e. f(vy + V2) = f(v) + 
f(v2) and f(av) = af(v) for vı, vı vin V, ain R. 


Subspace of a vector space. A subspace of a vector space 
is a subset with the property that sums and multiples of 
elements of the subset are again in the subset. By state- 
ment VII above it follows that a subspace of a vector 
space is itself a vector space. 


Va. This symbol, for n = 1, 2,..., denotes the 
standard vector space consisting of all functions from the 
finite set {1,2,...,}$ to R, added and multiplied by 
numbers in the natural way. Elements of V, are con- 
veniently described by listing their n values in order. 


Basis. A basis of a vector space V is a set of n elements 
D1, Ve,...,U, (n a positive integer) such that the linear 
map V,—V defined by (x1, X2,..., Xn) — X10; + 
XW F ` ** + XU, 1S one-to-one and onto. 


Dimension. A vector space is said to be n-dimensional 
if it has a basis containing n elements. (If so then the 
integer n is the same for all bases.) It is said to be 0- 
dimensional if it consists of a single element. It is said to 
be finite-dimensional if it is n-dimensional for some 
integer n = 0, 1, 2,.... Otherwise it is said to be 
infinite-dimensional. 


Canonical form. A linear map f: V — W whose range 


Chapter4 | Linear Algebra 124 


Exercises 


and domain are finite-dimensional is said to be in 
canonical form relative to bases v1, Vo,...,U, of V and 
Wi, W2,---5Wm Of W if there is an integer r such that 


f(X1Wv1 +e XnUn) = XyWy tect + XrWr. 


Rank. Every linear map f: V — W whose range and 
domain are finite-dimensional can be put in canonical 
form. The resulting integer r, which depends only on the 
map, is called its rank. 


Linearly independent. A set of vectors v1, v2,..., Un in 
a vector space V is said to be /inearly independent if the 
map (X1,...,Xn) = XW, + tt + xv, Of Vn —> V is 
one-to-one. Otherwise the set is said to be linearly 
dependent. 


Span. A set of vectors v1, Və, .. ., Un in a vector space 
V is said to span V if the map (x1,..., Xn) — X11 + 
--+ + x,v, Of Va —> V is onto. Thus a basis of V is a 
linearly independent set of vectors which spans V. 


Zero vector. The axioms imply that a vector space V 
contains a unique element O with the property that 
v + a:Q0 = v for any vin V and any number a. This 
element is called the zero vector of V. Every vector space 
has a unique 0-dimensional subspace, namely the sub- 
space consisting of the zero vector alone. The zero vector 
of any vector space of functions {f: S — R} is the func- 
tion which is identically zero; in particular, the zero vec- 
tor of V, is (0,0,..., 0). 


Kernel of a linear map. If f: V — W is a linear map, 
then the set of all elements v of V whose images under f 
are the zero vector of Wis a subspace of V. It is called the 
kernel of f. 


1 The following sets of linear relations define subspaces of 
V3. In each case determine the dimension of the subspace and 
find a basis. 


(a) fi + fe +f3 = 0 
(b) fi — 2f2 + f3 = 0 
(c) fi — fe = 0 

fe — f3 = 0 

fs —fi = 0 


(d) 2f1 + fe — fz = 0 
fi — 2fe+ f3 = 0 


4.5 | Abstract Vector Spaces 


125 


(e) 4f1 — 3fe+ fz = 0 
fit fe -— 3fz = 0 
2f1 + fet fz = 0 


2 The following sets of linear relations define subspaces of 
Ve. In each case determine the dimension of the subspace and 
find a basis. 


@Aitft+f[y+fa+fs+ fe = 0 
b) fi — 2fe + fz = 0 

fe — 2f3 + fa = 0 

fa — 2f4 + fs = 0 

fa — 2f5 + fe = 0 


3 Consider the set of all ‘mobile planar arrows’, that is, 
arrows in the plane which can be translated from point to 
point. 


(a) How can two such arrows be added? 

(b) How can such an arrow be multiplied by a number? 

(c) What is the dimension of the resulting vector space? 
(Give a basis.) 

(d) What physical quantities can be represented by such 
arrows”? Give a physical application of the addition 
operation of (a). 


4 Mobile arrows in space are a vector space in the same way 
as in Exercise 3. Give a geometrical interpretation of the state- 
ment that three arrows in space are linearly dependent (see 
Glossary). 


5 Prove that V, VI, VII are consequences of the axioms 
I-IV. [Use the uniqueness statement of IV.] 


6 Prove that the existence of a zero vector (see Glossary) is 
a consequence of the axioms I-IV and prove that the zero 
vector of any vector space of functions is the function which 
is identically zero. 


7 Prove that the kernel (see Glossary) of a linear map is a 
subspace. Restate Exercise 1 in terms of kernels. 


8 Why does VIII prove that a subspace of a vector space is 


_a vector space? 


9 If V, W are vector spaces then Hom(V, W) denotes the 
space of all linear maps from V to W. 


(a) Show that Hom(V, W) is a vector space when addi- 
tion and multiplication by numbers is defined in the 
obvious way. 

(b) Is the set of all functions {f: S — W} from an arbi- 
trary set S to a vector space W a vector space? 

(c) Show that a linear map from V, to Vm is a function 


n 
of the form y; = >> aijx; where (x1, x2,..., Xn) and 
j=l 


Chapter4 | Linear Algebra 


126 


(Y1, ¥2,..-5,¥m) are coordinates on V, and Vm 
respectively. In other words, show that a linear map 
is an affine map in which the constants are zero. 
Elements of Hom(V,, Vm) are therefore represented 
by m X n matrices. What are the operations of addi- 
tion and multiplication by numbers of elements of 
Hom(V,,, Vm) in terms of matrices? 

(d) What is the dimension of Hom(V,, Vm)? (Give a 
basis.) 


10 If V is any vector space, then the set of all linear maps 
from V to the one-dimensional space V1, i.e. Hom(V, V1), is 
a vector space by Exercise 9. It is called the dual of V and is 
denoted V* = Hom(V, 1). 


(a) Show how a linear map f: V — W gives rise to a 
pullback mapping f*: W* > V*. 

(b) How is the dimension of V* related to that of V? 

(c) Given a basis of V show how to obtain a basis of V*. 
This is called the dual basis of V. 


11 It is often convenient to represent elements of V, as 
column matrices, i.e. aS n X 1 matrices. Then an m X n 
matrix gives a linear map V,, —> Vm simply by multiplication 
of an n X 1 matrix in the usual way to obtain an m X 1 
matrix. 


(a) How then are elements of (V,,)* represented ? 
(b) Given a map Vn — Vm how is the pullback map 
(Vin)* — (Vn)* represented ? 


12 The ‘Fredholm Alternative’ states that a system of linear 
equations Mx = y in which range and domain have the same 
dimension, 1.e. 


aiixı + a19X2 + °t + AinxXn = Y1 
a21X1 + a22X2 + *** + a2nXn = Y2 
QAn1iX1 + An? X2 + s. + AnnXn = Yn, 


has one of the two following properties: Either the associated 
homogeneous equations 


a11Xı + a12x2 +°°* + AinXn =Q 

a21X1 + a22X2 + *** + a2nXn = 0 

Gn1X1 + Gn2X2 + tt + AnnXn =Q 
have only the solution x = 0, ie. (x1, X2,...,Xn) = 
(0, 0,...,0), in which case the given system has a unique 
solution (x1, x2,...,Xn) for every (y1, y2,...,¥n), or the 


homogeneous equations have some solution other than x = 0, 
in which case the given system never has a unique solution 
(either there is no solution or there are many). Deduce this 
statement from the canonical form for linear maps. 


4.6 | Summary. Affine Manifolds 


Summary. 
Affine Manifolds 


127 


13 The canonical form for linear maps V, — Vm can be 
stated in terms of matrices as follows: Let M be an m X n 
matrix. Then an m X m matrix P representing an invertible 
(one-to-one, onto) linear map Vm > Vm and ann X n matrix 
Q representing an invertible linear map Vn <> Vn can be 
found such that 

PMQ = C, 


where the canonical matrix C, is the m X n matrix (c;;) which 
is zero in all places except c11, C22,..., Crr Where it is one. 
The process of finding such P, Q given M can be carried out 
as follows: Starting with M, multiply on the left (by P1) or on 
the right (by Qı) by a simple matrix (shear, interchange of 
coordinates, scale factor) to obtain a new matrix Mı which is 
more like C,. Apply the same process to Mı to obtain M2 
more like C, and continue until C, is obtained. Going back 
and collecting all the P’s and Q’s then gives an equation of the 
form 
P;P;-1...P2Pi1MQ1Q2...Q; = C,» 


and hence P, Q are obtained by multiplying. Apply this 
method to find P, Q for each of the matrices 


A 1 D 4 -3 1 

1 1 -3 

ad —-2 1) 2 1 1 
1-1 0 0 1 4 1 YD 
B a 1-2 1 0 O0 0 
0 1-2 1 0 0 
2 1 -i 0 0 1-2 1 0 
1-2 1 0 0 0 1-2 1 


of Exercises 1 and 2. Show the relationship between this 
‘canonical form’ for matrices and the canonical form for 
linear maps. 


4.6 


The set of n-tuples of real numbers is denoted by R”. A 
function f: R” — R”, which assigns to each n-tuple of 
real numbers an m-tuple of real numbers, is described by 
naming the coordinates on range and domain—say 
(V1, Y2,-++sYm) on the range and (xj, Xo,..., Xn) on 
the domain—and by giving the m component functions 


Yi = fiil%1, X2,.. -3 Xn) 
(1) Yo = fo(X1, X25 +++ 5 Xn) 


Yn = fn{%15 XQ, 0005 Xn). 


Chapter4 | Linear Algebra 


128 


Such a function f: R” — R” is said to be affine if each of 
the component functions f; is a polynomial of the first 
degree 


fix, XQ, +25 Xn) = > AijXj + b; 
j=1 
(i= 1,2,...,m). 


The Implicit Function Theorem deals with the ques- 
tion “given an affine function (1) and given values for 
V1, Y2,- --, Ym, find all possible values for x1, xX9,..., 
Xn- It can be stated as follows: The x’s and y's can be 
rearranged, an integer r > 0 can be chosen, and affine 
functions g: R” — R’, h: R” — R” `” can be found such 
that the equations (1) are equivalent to the equations 


(2a) (xı, XQ, +> , Xr) = (V1, se ey Yrs Xrt lyse Xn), 
(2b) Vrois+++s Ym) = ACY 1, Yos- +5 Yr); 


that is, a set of n + m numbers x), x2,...,Xn, V1, YD 

., Ym satisfies the m conditions (1) if and only if it 
satisfies the m conditions (2). This solves the problem 
‘given y find x’ by stating that (2b) is a necessary and 
sufficient condition for the existence of a solution and 
that all solutions are then given by (2a). 

This form of the Implicit Function Theorem was 
proved in §4.1 by simple step-by-step elimination. The 
more detailed version of the theorem given in §4.4 de- 
scribes the integer r and the possible rearrangements of 
x’s and y’s for which (2) is possible in terms of pul/backs 
of forms under the given map (1). The integer r is the 
largest integer such that the pullback of some r-form 
under (1) is not zero. It is called the rank of the affine 
map (1). The solution (2) is possible (without rearrange- 
ment) if and only if the rank is rand the dx, dx... dx;,- 
component of the pullback of dy; dyə . . . dy, is not zero. 

The algebra of forms described in §4.2 is summarized 
by the formula dx; dx; = — dx; dxi. (Geometrically, this 
formula says that the rotation of 90° which carries 
(xi, x;) to (—x;, xi) preserves oriented areas.) Setting 
i = j gives dx; dx; = 0. The pullback operation is sum- 
marized by the formula d()a;x; + b) = doa; dxi. 

In §4.3, new notation was introduced. Among the new 
symbols defined there, the most important is the Jacobian 
notation 

d(1, Yo,-+ +5 Yk) 
(xı, XQ, +2035 Xx) 


4.6 | Summary. Affine Manifolds 


*Note that the usual terms imply a 
dimension, e.g. line = 1-dimen- 
sional manifold, plane = 2-dimen- 


sional manifold. The word ‘manifold’ 


is useful precisely because it leaves 
the dimension unspecified (many). 


129 


for ‘the coefficient of dx, dx... dx; in the pullback of 
dy, dyz... dyg under a given affine map (x) — (y), 
with an analogous notation for any set of k of the y’s and 


k of the x’s. The (; 


the coefficients of a @ x (z) matrix M® called the 


) x C) possible Jacobians form 


kth exterior power of the m X n matrix of coefficients M 
of the given map (y) > (x). 

The most important fact about computation with forms 
and pullbacks is the Chain Rule: “The pullback under a 
composed map is the pullback of the pullback.” In terms 
of Jacobians this is summarized by the formula 


where z denotesa k X k Jacobian of the composed map 
(x) — (y) > (z) and where >> denotes a sum over all 
h selections of k of the coordinates (y1, Y2, ..-5 Ym). 
In terms of exterior powers, the Chain Rule is 
(NM) = M®N®, 


The proofs of §4.4 used only the Chain Rule. 

A reformulation of the Implicit Function Theorem 
which sheds considerable light on its geometrical meaning 
is the following formulation in terms of affine manifolds: 
A k-dimensional affine manifold in R” defined by param- 
eters is a subset of R” which is the image of a one-to-one 
affine map f: Rt — R”. For example, the line x = 3t + 
l, y = 7t in the xy-plane defined by the parameter t. A 
k-dimensional affine manifold in R” defined by equations 
is a subset of R” which is a level surface of an onto affine 
map f: R” — R”. For example the line 7x — 3y = 7, 
which is the same as the parameterized line above. The 
dimensions can be remembered by the rule that n — k 
independent (onto) conditions on n variables leave k 
degrees of freedom. The Implicit Function Theorem 
implies (Exercise 1) that every k-dimensional affine 
manifold in R” defined by parameters can also be 
defined by equations and vice versa; hence the mode of 
definition is irrelevant and the notion of a k-dimensional 
affine manifold in R” is well-defined. Geometrically these 
are lines* in the plane, lines in space, planes* in space, 
etc. 


Chapter4 | Linear Algebra 130 


*Strictly speaking it would be better 
to use the word ‘manifold’ since 
‘plane’ implies k = 2. Similarly, 
‘level manifold’ would be better than 
level surface’. Nonetheless, 


‘coordinate plane’ and ‘level surface’ 


will be used in this book, with the 
understanding that these words do 
not imply ‘two-dimensional’. 


Exercises 


If A is a k-dimensional affine manifold in R”, then the 
projection of A on at least one of the k-dimensional 
coordinate planes* of R” is one-to-one and A is parame- 
terized by such a coordinate plane. That is, if (x1, x9,..., 
Xn) are the coordinates on R” and if the projection of A 
on the x;xX2...Xx,-plane is one-to-one, then there is an 
affine map F: R* — R”™* such that A is the graph of F 


{(x, y)i x = (x1,..., Xx), y = F(x)} 


in R”. For example, the line 7x — 3y = 7 1s the line 
y = 4x — 1) parameterized by x or the line x = 2y + 1 
parameterized by y. Two affine manifolds in R” are said 
to be parallel if there is a translation of R” (x1, Xe,..., 
Xn) > (xı + C1, X2 + C2,..., Xn + Cn) Which carries 
one to the other. 

In terms of these definitions the Implicit Function 
Theorem has the following geometrical meaning: The 
image of an affine map f: R” — R” is an affine manifold. 
Its dimension r is equal to the rank of the map. The level 
surfaces of an affine map f: R” — R” are a family of 
parallel affine manifolds in R”. Their dimension isn — r 
where v is the rank of the map. The image of fis param- 
eterized by (y1, yo,..., yr) if and only if the pullback of 
dy, dyz... dy, is not zero. The level surfaces of f are 
parameterized by (X,11, Xr42,---, Xn) if and only if the 
pullback of some r-form in (1, yo,..., Yn) contains a 
non-zero term in which none of the factors dx,+4, 
dX;49,..., dXn appear, i.e. a non-zero term in dx, dx 
... Xp. 


1 Use the Implicit Function Theorem to prove that every 
k-dimensional affine manifold defined by parameters can be 
defined by equations and vice versa. [‘Onto’ means r = dimen- 
sion of range, ‘one-to-one’ means r = dimension of domain. 
A graph {(x, y): y = F(x)} is defined by the parameters x or 
by the equations y — F(x) = 0.] 


2 Show that the rank of M“ is 4 where r is the rank of M. 


[Use the canonical form PMQ = C, of Exercise 13, §4.5. 
Use the definition of ‘rank’ to show that if PiM1Q1 = M2 
where Pı, Q are invertible then the rank of Mı is equal to 
the rank of Mo. It suffices then to find the rank of (C,) .] 
This includes as a special case the theorem that M has the 
same rank as its transpose. 


4.6 | Summary. Affine Manifolds 


131 


3 Prove the following statements relating to orientations of 
R” (see §1.4): A set of n + 1 points PoP;P2...P, in R” is 
said to be in general position (non-coplanar) if it can be writ- 
ten as the image of the set of points (0,0,...,0), (1, 0, 0, 
...,0), (0,1,0,...,0),...,(0,0,...,0, 1) (in that order) 
under a one-to-one (hence onto) map fp: R” — R”. If 
PoP,...P,and Qo0Q1... Qn are two sets of points in R” in 
general position, then it is said that their ‘orientations agree’ 
if the affine map fp ° fg !: R” — R” carrying one set to the 
other has positive Jacobian. 


(a) Show that the orientations of PoPiP2...P, and 
P\PoP2...P, do not agree. 

(b) Show that if the orientations of PoP;...P, and 
QoQ1...Q, do not agree then the orientations of 
P,PoP2...P,and QoQ... Qn do agree. 

(c) Show that if the orientations of PoP:...P, and 
QOoQ,...Q, agree then PoP, ...P, can be moved 
continuously to QoQ; ... Qn in such a way that they 
remain in general position all the while. [Applying 
fp ', one assumes PoP; ...P, is the standard set of 
n + 1 points. Write fọ as a composition of shears, 
rotations, translations and scale factors, as in Exercise 
3, §4.3. The number of negative scale factors must be 
even, hence making all of them + does not change fg. 
This reduces (c) to four simple cases. ] 

(d) Show that if the orientations disagree then the motion 
of (c) is impossible. [A non-zero continuous function 
cannot change sign.] 

(e) Show that two quadruples (xo, yo, Zo),..., 
(x3, y3, z3), and (x9, Yo. zo), e.s (x3, Y3, z3) de- 
scribe the same orientation of xyz-space if and only if 
the determinants 


1 xo yo zo| |l xo Yo Zo 

1 / A 

l xı yı 21 1 ox) y zi 

9 / / A 

1 x2 y2 Z2 1 X2 yg Zə 
A , 

1 x3 y3 23 1 x3 y3 23 


have the same sign. 


4 Suppose that a sequence of affine maps is given 
f f Í Í J 
{pt.} 4, R” 2, R”2 $, R”3 =.. +, R” tL, {pt.} 


with the property that the image of f; is a level surface of 
fi+1 (so in particular the level surfaces of fo are points—f2 
is one-to-one—and the image of f, is all of R»—f, is onto). 
Such a sequence of maps is said to be exact. Show that the 
alternating sum of the dimensions is zero, i.e. nı — n2 + 
n3—°'':-+tn, = 0. [Write n; as a sum of two terms and 
cancel. ] 


differential 
calculus 


chapter 5 


5.1 


The Implicit Function Theorem Consider the problem of solving a system of m equations 
for Differentiable Maps in n unknowns 


(1) yi = filX1, X2,- -© Xn) (= 1) 2y-45.25N) 


for all possible values of (x1, X2,.-., Xn) given (1, Ya, 

., Ym). Chapter 4 deals with the solution of this prob- 
lem in the special case where the functions f; are affine 
functions, that is, functions of the form 


SAX Xa, ++ Xn) = daxi + lka + °° + AinXn + br 


The solution is given by reducing the equations to the 
form 


(2a) I =p Vig ha Vig Apis Xn) (Di avg) 
(2b) yg = Ay, .- +s Yr) G=r+1,...,m) 


by step-by-step elimination (where the functions g;, hi 
are affine functions). The equations (2b) are then neces- 
sary and sufficient conditions for the given (Y1, ya,..-; 
Ym) to be of the form y = f(x) for some x; and, when 
they are satisfied, all possible values of (x1, X2,...5 Xn) 


132 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhauser Classics, 
DOI 10.1007/978-0-8176-8412-9_5, © Harold M. Edwards 2014 


5.1 | The Implicit Function Theorem for Differentiable Maps 


*A function of n variables is said to 
be differentiable if its first partial 
derivatives exist and are continuous. 
Jacobians and pullbacks of forms 
under such functions 

are defined in §5.2. 


tHere, and in the remainder of this 
chapter, only local properties are 
considered and functions are 
assumed to be defined locally near 
the points in question. In these 
contexts the notation (1) and the 
notation f: Re — R™ mean that the 
domain of f is contained in R®, not 
that it is necessarily all of R®. 

For example the function 


f(x) = Zis permitted. 


(2a) 
2 
2 low 


133 


can be found by choosing x,44, Xr42,..., Xn arbitrarily 
and substituting in (2a). A solution of this form is 
possible if and only if 


| ai Yas «ves Je) 
(i) TEST X2se..s Xr) * 0 


and 
(ii) the pullback of every k-form is zero for k > r. 


It is shown in this chapter that the same is true locally 
for systems (1) in which the functions f; are differentiable” 
functions. This is the Implicit Function Theorem for 
differentiable maps, which states that Jocally near a given 
solution there is a solution of the form (2a), (2b) in which 
g;, hi, are differentiable functions if and only if the 
conditions (i), (ii) are satisfied at all points near the given 
point. 


Implicit Function Theorem 


Let 


(1) 


be a system of equations in which the functions f; are 
defined} and (continuously) differentiable near the point 
(Xis Xo,..-, Xn) in R”, and let (V1, Vo, .--, Pm), where 


Ji = fiX, XQ, 2065 Xn) 


Vi = filX1, X2,.- , Xn) .,m) 


be the corresponding point in R”. If the conditions 


n O15 Yas +++ Yr) 2 = — 
(1) TENETE Xa... xX.) ~ Oat (X1, XQ, +205 Xn). 


and 


(ii) The pullback of every k-form for k > r is identi- 
cally zero near (X1, X2,...5 Xn). 


are satisfied, then there are differentiable functions g;, h; 
and a number e > 0 such that the relations (1) and the 
relations 

. 51) 
r+l,...,m) 


Xi = gilYis -s Yrs Xrtiy -s Xn) 
Yi = hiyi,-- +5 Yr) (i = 


are defined and equivalent at all points (x1, X2,..., Xn, 


Chapter | Differential Calculus 


134 


Vis Von -< - , Ym) Of R"*™ within e€ of (X1, Xo... - , Xn, Jis 
J2,- - -s Jm) 
Ix; — X,| <€ (i= 1,2,...,n) 
|y: — Vil < € (i= 1,2,...,m). 


That is, the functions fi, g;, 4; are all defined at such 
points and the relations (1) are satisfied if and only if the 
relations (2) are satisfied. Conversely, if (1) can be re- 
duced in this way to the form (2) near (X1, Xo,..., Xn, 
Y1, F2, - - - , Pm) then the conditions (i) and (ii) must be 
satisfied. 


An important difference between the Implicit Function 
Theorem for affine maps and the present theorem is the 
possibility of singularities in the differentiable case. Given 
an affine system (1) it is always possible to rearrange the 
variables x; and the equations y; in such a way that 
conditions (i) and (ii) are satisfied, but such a rearrange- 
ment is not always possible for differentiable systems (1). 
(If (i) and (ii) are to be satisfied, then r must be the 
largest integer such that the pullback of some r-form 1s 
not identically zero; it follows that the x’s and y’s can be 
OVis Vos +++ 5 Yr) 
0(x1, XQ, 2-205 Xr) 
zero, but this Jacobian is a function of (x1, Xo, .--, Xn) 
and the statement that it is not identically zero does not 
imply (1). If the system (1) is affine, then the Jacobians are 
constant and the statement that O15 Yas ++ Ye) is not 

0(X1, XQ,-+25 Xr) 
identically zero does imply (i).) A point (X1, X2,..., Xn) 
is called a singularity of (1) if there is no solution of the 
form (2) no matter how the x’s and y’s are rearranged. 
The following examples illustrate the theorem and show 
that singularities occur even for very simple systems (1). 

As a first example, consider the mapping f: R? — R 
defined by the equation 


(3) y= ur +o? 


Since 


rearranged so that is not identically 


dy = 2udu + 2v dv 


the relationship (3) can, by the Implicit Function 
Theorem, be written in the form 


u = g(y, v) 


, oy , 
locally near any point (a, 5, 7? + 5°) where o is not 
u 


5.1 


| The Implicit Function Theorem for Differentiable Maps 135 


circles y = const. 


lines v = const. 


zero, i.e. where 7 Æ 0. Similarly, (3) can be written in the 
form 


v = g(y, u) 


locally near any point where ọ Æ 0. All points (ū, D, 
n? + 5°) are covered by at least one of these two cases 
except for the point 7 = 0,5 = 0. This point is, by the 
above definition, a singularity. 

The geometrical significance of these solutions can 
easily be seen. Let (u, 5) be a given point other than the 
singularity (0, 0). The set where y = YF is the circle whose 
center is (0, 0) and which passes through (@, 5). Consider 
this circle and the horizontal line v = 6 through (@, 5). 
Circle and line intersect in two points unless 7 = 0, in 
which case the line is tangent to the circle and there is 
only one point of intersection (or, as it is Sometimes 
stated, the two points of intersection coincide). Assume 
now that u Æ 0. It is clear geometrically that if > is 
changed slightly and ọ is changed slightly, then the 
corresponding circle and line will still intersect in two 
points, one of them near (u, 0) and the other (relatively) 
far away. In this way, every (y, v) near (F, 5) determines 
a point of the uv-plane near (u, 5). The u-coordinate of 
this point is therefore a function of (y, v) near (F, 5) and 
this is the solution u = g(y, v). It is clear that there 1s no 
such solution u = g(y,v) if u = 0 because then a small 
change in circle and line can result in either no point of 
intersection or in two points of intersection equally near 
to (ū, D). 

In this simple case the function g(y, v) can be given 
explicitly, namely 


a) go) =f VZ in>0 
—vVy — v? ifa < o0. 

These functions are defined and differentiable provided 
y — v? > 0. Thus g(y, v) is defined at all points (y, v) 
near (9,0) and ‘u = g(y,v) is equivalent to ‘y = 
u? + v? and u has the same sign as n’. Therefore 
u = g(y,v) is defined and equivalent to y = u? + v’ 
for all (u, v, y) sufficiently near (ū, D, P). 

As a second example, consider the mapping f: R? > 
R? defined by 


X = UU 


(9) y =u?’ ++ ov’. 


Chapter5 | Differential Calculus 


0 = 


X = const. 


X 


TEN 


u 


136 


Here 


dx dy = (udv + v du)(2u du + 2v dv) = 2(v? — u’) du dv. 


Thus r = 2 because this 2-form is not identically zero 
near any point (u, v). However, it is equal to zero at all 
points along the lines u = +v and these points are 
singularities; that is, (i) and (ii) cannot be satisfied by 
rearranging variables. At all other points they are 
satisfied and the theorem states that locally the equations 
(5) can be solved to give 


u = g(x, y) 
6) v= g2(x, y) 


for (x,y) near (x,y) and (u,v) near (ū, 0). 

By sketching the level curves x = const. (hyperbolae 
with axes u = +v, asymptotes u = 0, v = 0) and 
y = const. (circles with center the origin) it can be seen 
that, given (@, 5), the corresponding level curves x = X 
and y = F intersect in four points unless 7 = +ọ0, in 
which case they intersect in two points of tangency (four 
points coincident in pairs). Excluding 7 = 5 it is clear 
geometrically that for x near X and y near F the curves 

= const., y = const. intersect in exactly one point 


y=const. near (u, 0) as well as in three other points which are 
(relatively) far away. The uv-coordinates of this point of 
intersection are therefore functions of x, y for x near X 
and y near F; hence (6). 

In this case, an explicit solution is more difficult to 
give than in the previous case. Step-by-step elimination 
gives first 

x 
v => 
u 
2 
=u + (ŽŽ 
p= (a) 
and then, solving for u, 
wy = ut + x? 
2 yptwvVy? — 4x? 
u re aren ee aa. 
2 
\ 2 
and for v, 
2x? = aE Vy = 4x8) _ eve = 4 
+ Vy? — 4x? y2 — (y2 — 4x?) 2 


5.1. | The Implicit Function Theorem for Differentiable Maps 137 


y =F (x) 


giving the final result 


2 
(6 
) y F Vy? — 4x? 
v= + 7 


where the formula for u involves two choices of sign and 
where the signs in v are then determined using v = x/u. 
For each (x, y) the formula (6’) gives four points (u, v) 
as expected; the signs must then be determined in such a 
way as to select the one near (i, D). 

This explicit solution, cumbersome as it is, is possible 
only because the mapping (5) is particularly simple. If 
the fourth-degree equation for u had involved u? and u 
as well as ut, u°, then it could not have been solved by 
means of the quadratic formula, and the final solution 
(6’) would have been very much more complicated. From 
this it is clear that an explicit algebraic solution of equa- 
tions of the form 


x = pilu, V) 
y = po(u, v), 


where pı and pə are polynomials, will not be feasible in 
general. It can be shown that in fact the solution of (7) 
for (u, v) as functions of (x, y), even when pı and pg are 
polynomials of relatively low degree, is in general im- 
possible in the sense that the values of (u, v) cannot be 
obtained from those of (x, y) by arithmetic operations 
and the extraction of roots. Nonetheless, the Implicit 
Function Theorem proves that locally the solution exists 
in the sense that if a point (ū, ọ, x,y) is given with 
X = pı(ū, 0), Y = po(a, 0), then every point (x, y) near 
(x, Y) is the image of one and only one point (u, v) near 
(ū, ©) under (7) (unless (u, &) is a singularity of (7)), and 
that the mapping (x, y) — (u, v) so defined is differenti- 
able. This statement about the existence of a solution is 
all the more important in cases where the solution 
cannot be found explicitly, since it gives information 
about the mapping (7) which could not be derived by 
algebra alone. 
For the case of a single function of a single variable 


(8) y = f(x) 


the conclusions of the Implicit Function Theorem are 
familiar facts of elementary calculus. If n = m = r = 1, 


(7) 


Chapter5 | Differential Calculus 


*The statement that the inverse 
function is to be differentiable is 
crucial here. The function y = x3 has 
an inverse function x = ~/y valid for 
all y, but this function is not, of 
course, differentiable at y = O. 


y 


y =f (x) 


x =g9(y) 


138 


then the theorem states that a (continuously) differenti- 
able function (8) admits a differentiable inverse function 


(9) = g(y) 


near a point Y = f(X) if and only* if the pullback 
dy = f'(x) dx is not zero at X, that is, if and only if the 
derivative of fis not zero at ¥. On the other hand, if the 
derivative is identically zero near X then r = 0, equation 
(2a) is not present, and equation (2b) gives y as a ‘function 
of no variables’. That is, y = const. and the statement 
reduces to the familiar fact that a function of one variable 
is constant if and only if its derivative is identically zero. 
The function (8) has a singularity at x when f’(X) = 0 
but f’ is not identically zero near X. Such a point is also 
called a ‘critical point’ of f, for reasons discussed in 
§5.4. The fact that a simple function y = f(x) can have 
a very complicated inverse function is also familiar. For 
example, the function 


y= 2x3 + x? +2x— 1 


has a positive derivative for all x, which implies that the 
function always increases, and hence that each y is the 
image of one and only one x, even though the explicit 
solution for x as a function of y is not at all simple. 

A different type of singularity is presented by the map 
f: R — R? defined by 


x= f 


(10) y=fť. 


Here the solutions would be of the form 


t = g(x) or t = g(y) 
y = h(x) = h(y). 


There is a solution of the first type provided dx = 
3t? dt ~ 0 and a solution of the second type provided 
dy = 2t dt # 0. Thus there is a solution of either type 
near a point t ~ 0 but the point ¢ = 0 is a singularity. 
The explicit solutions are 


(11) 


t=We t= ay"? 
(11) 9/3 OF 3/2 
y= x” x= +y 


the first being valid for x ¥ 0, the second being valid for 
y > 0 when the sign is properly chosen. A sketch of the 


5.1. | The Implicit Function Theorem for Differentiable Maps 139 


*The symbol Arccos x, defined for x 
a real number satisfying |x| < 7, 
denotes the unique real number y in 
the interval {0 < y < xr} such that 
x = cos y. The symbol Arcsin x is 
defined similarly. 


curve x? = y’ shows the meaning of these solutions and 
shows that there is indeed a ‘singularity’ at (0, 0). 

As a final application of the Implicit Function Theorem 
consider the map f: R? — R? given by 


cos(u + v) 
sin(u + v). 


x 
y 


(12) 


Since dx dy = 0 the integer r in any reduction must be 
<1. On the other hand, 


dx = —sin(u + v) du — sin(u + v) dv 


has neither component equal to zero except on the lines 
where u + v is a multiple of r; hence there are solutions 


u = g(x, v) or ? g(x, u) 
y = h(x) y = A(x) 


I 


near points (u, 0) with 7 + 5 Æ nr. Similarly there are 
solutions 


= g(y, v) or 87 g(y, u) 
x = hy) x = h(y) 


near points (u, 0) with ŭ + 0 Æ nr + 5 . Note that at 


least one of the two is valid at every (a, 0). Hence there 
are no Singularities. Explicit solutions can be given in 
terms of the inverse trigonometric functions Arccos* and 
Arcsin, e.g. 


u = Arccos x — v + nr 
y= +vV1 — x? 


where the integer n and the sign + depend on the point 
(ū, D). 

If the point (X1, X2,..., Xn) is not a singularity of (1) 
then, by definition, the x’s and y’s can be rearranged (if 
necessary) so that there is a solution of the form (2). If 
this is the case then the integer r is determined by the 
map (1) (as the largest integer such that the pullback of 
some r-form is not zero at (X1,Xo,...,Xn)) and is 
independent of the particular solution (2). It is called the 
rank of (1) at (X1, Xo,...,%,) and (1) is said to be 
non-singular of rank r at (X1, X2, ..., Xn). If (1) is singu- 
lar at (X1, Xo,..., Xn) then the rank is not defined. By 


Chapter5 | Differential Calculus 


Exercises 


140 


the Implicit Function Theorem, the map (1) is non- 
singular of rank r at (X1, Xo,..., Xn) if and only if the 
pullback of every (r + 1)-form in the y’s is identically 
zero near (X1, X2,..., Xn), but the pullback of some 
r-form in the y’s is not zero at (X1, Xo, ..., Xn). 


1 Near what points (z, 5) can the equation y = u? — v? be 
solved to give u = g(y,v)? To give v = g(u,y)? Give 
explicit solutions. Sketch the level curves y = const. and 
relate their geometry to the existence of solutions u = g(y, v), 
v = g(u, y). For what values of y can y = u? — v? be solved? 
What is ‘singular’ about the solution if y = 0? 


2 Simplify the solution of (5) for u, v as functions of x, y by 
considering y + 2x. For what values of (x, y) is there a (u, v), 
i.e. what is the image of the map (5)? For each of the eight 
possible choices of sign in (6’) find the points (%, ©), if any, at 
which (6’) then gives a solution of (5). 


3 Describe geometrically the solution of y = u? + v? + w? 
for u = g(y, v, w). Near what points (u, 0, w) is this possible? 


4 Discuss the solution of the equations 
x=u 
y = 2uv 


for (u,v) given (x,y). Sketch the level curves x = const., 
y = const. 


5 Discuss the solution of the equations 


x = e” COS VD 
y = e” sinv 


for (u, v) given (x, y). [Compare to polar coordinates.] 


6 The Implicit Function Theorem (n = m = r) says that a 
differentiable map f: R” — R” which is non-singular of rank 
n is locally both one-to-one and onto. Example 5 shows that 
neither is true globally. Elaborate on this statement. Which of 
the two (one-to-one or onto) is true globally of non-singular 
maps f: R —> R? 


7 Describe the example (12) geometrically as a map 
R? => R — R?. 


8 Consulting a book on elementary calculus, prove the case 
m = n = r = 1 of the Implicit Function Theorem. Give a 


5.1 


| The Implicit Function Theorem for Differentiable Maps 141 


complete definition of the function Arcsin x (assuming the 
function sin x has been defined). 


9 The folium of Descartes is defined by the equation 
x? + y = xy. A good picture of the curve can be obtained 
from the following observations: The equation is unchanged 
when x and y are interchanged, therefore the curve is sym- 
metric about the line x = y. The curve intersects each line 
x = const. in at most three points and at least one point. 
Similarly, the curve intersects each line ax + by = const. in 
at most three points and at least one point unless a = b, in 
which case it intersects in at most two points. The intersections 
of the line x + y = C with the curve are most easily found by 
using the substitution x = (C/2) + t, y = (C/2) — t. Prove 
these statements and sketch the folium of Descartes. Find the 
singularities of the function F = x? + y? — xy. Where is 
F > 0? Where < 0? Sketch the curves F = const. Near 
what points (x,y) on the folium F = 0 can the equation 
F = 0 be taken as defining y implicitly as a function of x? 
Indicate in a sketch the various functions y = f(x) which 
satisfy F(x, f(x)) = 0. 


10 Using the Implicit Function Theorem, give a sufficient 
condition for an equation F(x, y,z)=0 to determine z 
locally (near a given solution) as a function of x and y. 


11 Let F(x, y, z), G(x, y, z) be two (differentiable) functions 
of three variables, and let (x, ¥, Z) be a point where dF # 0, 
dG ~ 0. F and G are said to be functionally related (or 
dependent) near (X, 7, Z) if there is a (differentiable) function 
f of two variables, defined and not identically zero near 
(F(X, 7, 2), G(X, Y, Z)) such that f(F, G) = 0 for (x, y, z) near 
(x, Y, Z). Use the Implicit Function Theorem to show that this 
is true if dF dG = 0 near (X, y, Z). Conversely, show that if F 
and G are functionally related then dF dG = 0 at (X,Y, Z), 
and hence that if F, G are functionally related then dF dG = 0. 


12 Sketch the graph of the function 


1 
2 sin( — }, x #0 
fe) =." (3) 
0, x = 0. 


Show that the derivative f'(x) = lim [f(x + A) — f@)]/h 
0 


h 
exists for all x but that f’(x) is not a continuous function. 
(Therefore, this function is not ‘differentiable’ in the sense 
defined in §5.2.) 


13 Define the ‘local rank’ of a system of equations (1) at the 
point (X1, X2,..., Xn) to be the largest integer r such that the 
pullback of some r-form is not identically zero near (X1, 


Chapter5 | Differential Calculus 


k-Forms on n-Space. 
Differentiable Maps 


142 


X2,..., Xn) and define the ‘infinitesimal rank’ to be the largest 
r such that the pullback of some r-form is not zero at the point 
(X1, X2,..., Xn). Define ‘singular’ and ‘non-singular of rank 
r’ in terms of ‘local rank’ and ‘infinitesimal rank.’ 


5.2 


Let (x1, X2,..-, Xn) denote the coordinates on R”. A 
(variable) k-form on n-space R” is a function assigning 
k-forms in x1, X9,..., Xn to points (x1, X2,..., Xn) of 
R”. Since constant k-forms are represented by sums of 
terms of the form A dxi, dx;,...dx;, in which A is a 
number, (variable) k-forms can be represented as sums 
of such terms in which A is a function A(xj, X9,..., Xn). 
In short, the definitions of §2.1 are to be extended in the 
obvious way to k-forms in n variables. 


As before, it is computations with k-forms which are 
of primary importance. In addition to the rules dx; dx;= 
0, dx; dx; = —dx; dxi, these computations are governed 
by the rule for computing pullbacks which is summarized 
by the formula 


dy; = $H dey + SH dxa to + ‘diy, 
1 2 


More formally, the pullback of a given k-form 


(1) A(V1, Y2,- - -s Yn) dyi dYa... dyr +°: 


in Y1, Y2, . -Ym under a given map 


(2) Vi = fixi, X2.» -3Xn) (i= 1,2,...,m) 


is defined to be the k-form in x1, X2, .. ., Xn obtained by 
carrying out the substitution (2) expressing the y’s in 
terms of the x’s and the substitution 


n ay; 
(3) dy; = D2! dx; 


expressing the dy’s in terms of the x’s and dx’s. Here 


ay; oo. 
= denotes the partial derivative 
Xj 


filX1, 2.6 Xi HA... Xn) — fill%1, ~~ Xn) 


5.2 | k-Forms on n-Space. Differentiable Maps 143 


*That is, given (X1, X}... Xn) and 
given e > O there is a ô > O such 
that |A(x}. X2,....Xn) — A(Xı, 


Xle., Xn)| < e whenever 


[x1 — Xı| < ô, |x2 — Xa] < ô... 


Xn — Xal < ô. See §2.3, p. 31. 


which is a function of (x1, X2, ..., Xn). The resulting 
expression is a sum of multiples of products of 1-forms 
in the x’s. Multiplying out according to the rules of §4.2 
then gives a k-form in the x’s which is defined to be the 
pullback. As in §4.2, this definition is valid only after one 
observes that if the given expression (1) is replaced by 
another expression representing the same k-form, then 
the resulting expression for the pullback represents the 
same k-form as before; this follows from the observation 
that the rules ww = 0, wa = —ow hold for arbitrary 
(variable) 1-forms in the x’s as well as for the 1-forms dx;. 

Asin Chapter 2, all k-forms considered will be assumed 
to be continuous, that is, the coefficient functions 
A(X 1, X2, ..., Xn) will be assumed to be continuous*. 
In particular, all mappings (2) considered will be assumed 
to have the property that the 1-forms (3) are continuous. 
In other words, the functions (2) will be assumed to 
have the property that all mn first partial derivatives 


ð . . . . . 
Yi exist and are continuous. Such a function (2) is said 
Xj 


to be differentiable. (To be more precise, a function 
f: R” > R” whose mn first partial derivatives all exist 
and are continuous is said to be ‘C'-differentiable’ or 
‘continuously differentiable’. In this book the word 
‘differentiable’ is used only in this sense unless otherwise 
stated. This will be emphasized by occasionally writing 
‘(continuously) differentiable’ instead of ‘differentiable’.) 
The pullback of a continuous k-form under a differenti- 
able map is a continuous k-form (because compositions, 
sums, and products of continuous functions are con- 
tinuous). 

The central theorem concerning computations with 
k-forms is the Chain Rule. 


Chain Rule 

Let 

(4) Yi = filX1, X2.. , Xn) (i= 1,2,...,m) 
and 

(5) Zi = iY Y2- --Ym) C= 1,2,...,p) 


be differentiable maps R” — R” and R” — R?’ respec- 
tively. Then the composed map R” — R? is differentiable 
and the pullback of any k-form under the composed map 
is equal to the pullback of the pullback. 


Chapter5 | Differential Calculus 


144 


The Chain Rule will be proved in the following section. 
In Jacobian notation the statement of the Chain Rule is 


where the left side is a Jacobian of k of the z’s with 
respect to k of the x’s and where the right side is a sum of 


(7) terms as in Chapter 4. When k= 1 this is the 
formula 
dz; _ $e dz: ayy 


OX; v=1 Oy, OX; 


which is the usual chain rule of differentiation. Note that 
the differentiability of z(x) is asserted to be a consequence 
of the differentiability of z(y) and y(x). 

Many formulas of differential calculus are simple 
consequences of the Chain Rule. For example, applying 
the Chain Rule to the composition of the maps 


x= t 
p= xyz, yt 
z= ít 


gives dp = yz dx + xz dy + xy dz = ttdt + ttdt + 
tt dt = 3t? dt, i. d(t®?) = 3t? dt. In the same way 
d(t”) = nt"—! dt for any positive integer n. Less obvious 
formulas can be obtained from the Chain Rule by 
implicit differentiation. As will be shown in the next 
section, this process results from the application of the 
Chain Rule to implicitly defined functions. 


implicit Differentiation 


If the conditions of the Implicit Function Theorem are 
satisfied and the functions f; determine functions g;, hi, 
then the partial derivatives of the implicit functions g;, h; 
at a given point can be found in terms of the partial 
derivatives of the given functions f; at that point by 
writing 


_ ua Of; 
(6) dy; = 2 ax, D 
solving formally for dxı,...,dXr, dyr4i1,.--,@QVm in 


terms of dy1, .. . , dYr, dXr41,..., dXn, and writing the 


5.2 | k-Forms on n-Space. Differentiable Maps 145 


(7) 


result as 
dx, = Bay + YBa, G=1,2...50 
~ Oh; . 
dy; = > — dy; G=rt+1,...,m) 
jai OV; 


thereby determining the partial derivatives in question. 


Note that the result is expressed in terms of the 


variables x1, X2,..., Xn of the functions f. If ôg ; ôg , on 
Oy Ox ody 
are to be found as functions of the variables (y1,...5 Yr, 
Xr4iy+++ Xn), then the explicit solutions g, h for the 
x’s as functions of (y1,..., Yrs Xr415--+5 Xn) must be 
used. Note also that in this context the Implicit 
Function Theorem can be stated simply “The functions 
g, h exist if and only if they can be differentiated im- 
plicitly and the partials of h with respect to x,41, Xr+2, 
... Xn are identically zero,” because (i) is true if and 
only if the first z equations of (6) can be solved for dxy, 
dxo,..., dx, and (ii) is true if and only if the remaining 
equations are then independent of dx,144,..., dXn. 


Examples 


If y = u? +v? is solved for u = g(y,v) then the 
derivatives of g are found by 


dy = 2udu+ 2vdv 


l Ù 
du = 7 AT 
giving 
ôg l, 3g _?, 
ðy 2u ðv u 


The Implicit Function Theorem says that g is defined 
locally near any point where this solution is possible, i.e. 
where u Æ 0. If the derivatives of g(y,v) are to be 
expressed as functions of (y,v) then the solution 
g(y,v) = +vVy — v2 must be used. This gives 


l 
2V y — v? 
Oo U e 
++/y — V? 


5 bevy — 2] = + 


ð —___ 
ap ltVy — P= — 


Chapter5 | Differential Calculus 


146 


If x = t’, y = t° is solved for t = g(x), y = h(x) 
then the derivatives of g, h are found by setting 
dx = 3t° dt 
dy = 2t dt 
and solving 


l „a o 
dt = z3 dx 8x) = 33 
2 4 o2. 
dy=Fdx W(X) = 


This is possible if and only if t = 0, which is the condi- 
tion required by the Implicit Function Theorem to 
guarantee (locally) the existence of g, h. To express g’, h’ 
as functions of x the explicit solution ¢ = x!/? must be 
used, giving 


A yg __! l x7 2/8 


dx 3x? 3 
d 2/34 __ 2 — 2 —1/3 
dx be l= 307/x ~ 3% 


If polar coordinates x = r cos 0, y = r sin 0 are solved 
for r = g(x, y), 0 = g(x, y), then the derivatives can 
be found by solving 


dx = cos 6dr — r sin 0 dé 
dy = sin 0 dr + r cos 0 dé 


for dr, dé. Since 


cos 0 dx + sin 0 dy = dr 
—sin 6dx + cos 0 dy = r d0 


the solution is 
= : _ 981 O81 
dr = cos 8 dx + sin 0 dy = ay dx + ay dy 


_ sinô cosé , _ 082 082 
dé = — dx + —— dy = 4 Eta dy. 


The functions g1, go are defined locally near any point 


where r Æ 0. If the derivatives of g1, g2 are to be ex- 
pressed as functions of (x, y) the solution 


r= Vx? + y? = g(x, y) 


6 = Arctan (2) = g2(x, y) (if x = 0) 


5.2 | k-Forms on n-Space. Differentiable Maps 147 


must be used, giving 


ð rss x 

— 2 2 = =o = 

ax Vx? + y2 = cos 0 , ea 
b Sey yp = sin 0 = a ye 
oy ro A/x2 + y2 


ð y sinb oS D 
ax |Arctan (2)| 7 257 agy 


2 |Arctan (2)| = 6088 _ 
oy x r 


Xx 
>t 
NO 
+| 
< 
N 


Exercises 1 Ifa point moves in the xy-plane according to parametric 
equations x = f(t), y = g(t) and if F(x,y) is a quantity 
(temperature, pressure, altitude) depending on (x, y), then the 
rate of change of F with respect to ¢ is expressed by the Chain 
Rule in terms of 


(AÐ, g@)) = location at time t 
f’(@) = velocity in x-direction at time ź 
g'(f) = velocity in y-direction at time t 


and the partial derivatives of F. 


(a) Let F(x, y) = xy, let (x,y) = (fŒ), g@)), and let 
a = f'(t), b = g'(t). Find the rate of change of F in 
terms of x, y, a, b. 

(b) Let an arrow starting at (x, y) and ending at (x + a, 
y + b) represent position (x, y) and velocity (a, b); 
such an arrow is called a ‘velocity vector’. Draw the 
velocity vectors corresponding to 


position velocity 
(1, 0) (0, 1) 
(2, 2) (0, 1) 
(—3, 1) (0, 3) 


(—4,-2) (,1) 
(c) For each of these velocity vectors find the rate of 
change of F(x, y) = xy, paying particular attention 


to the sign. 
(d) If the position is (2, —3) and if the point is moving 
along the hyperbola xy = —6 what condition must 


be satisfied by the velocity (a, b) according to part (a)? 
Draw several velocity vectors satisfying this condition. 
(e) Using (d) find the equation of the line tangent to 
xy = —6 at (2, —3). [A line through (2, —3) can be 
written in the form A(x — 2) + B(y + 3) = 0.1] 


Chapter5 | Differential Calculus 


148 


(f) Find the equation of the line tangent to xy = X7 at 
the point (x, F). 

(g) Why is this not valid at (0, 0)? 

(h) Find the s/ope of the tangent line to xy = 1 at the 
point (Xx, 1/X), hence the derivative of the function 


f(x) = 1/x. 
(i) Find the derivative of 1/x by implicit differentiation 
of xy = 1. 


2 (a) If F(x, y) = 2x? + y? and if a point at (x, F) moves 
with the velocity (a, b), what is the rate of change of F? 

(b) Sketch the ellipse 2x? + y? = 12 near the point 
(2, 2). On which side of the ellipse is F > 12 and on 
which is it < 12? 

(c) Draw six velocity vectors at (2,2), two for which 
2x? + y? has positive derivative, two for which it has 
negative derivative, and two for which its derivative 1s 
zero. 

(d) Find the equation of the tangent line to the ellipse at 
(2, 2). [Write it in the form A(x — 2) + B(y — 2) = 
0.) 

(e) Find the equation of the tangent line to the ellipse 
2x? + y? = const. at the point (x, y). [Write it in the 
form A(x — X) + By — 7) = 01] 

(f) Use (e) to find the slope of the tangent to the graph of 
y = —V1 — 2x2 (|x| < 1v2) as a function of x. 
(g) Find the derivative of f(x) = —V/1 — 2x2 by im- 

plicit differentiation of z = 2x? + y?. 


3 (a) Find the line which is tangent to the curve xë + y? — 
xy = const. at the point (x, Y). [Write the answer in 
the form A(x — X) + B(y — 7) = 0.] 
(b) Find all points at which the tangent line to the folium 
of Descartes {x3 + y3 = xy} is vertical. 
(c) Is horizontal. 
(d) Has slope —1. 


4 (a) What is the general formula for the line tangent to 
F(x, y) = const. at (¥, Y)? [Mnemonically the answer 
can be written dF = 0.] 

(b) If f(x) is a function such that F(x, f(x)) = const., 
what formula does this give for f’(x) in terms of the 
derivatives of F? When is it valid? 

(c) Deduce the same formula by implicit differentiation. 


5 Give a rigorous statement of the formula 


dy 1 
dx dx 
dy 


[for the derivative of an inverse function x = g(y) of | y= 


5.2 | k-Forms on n-Space. Differentiable Maps 149 


f(x)] as an implicit differentiation. Given that £ [e7] = eF, 
x 


use this formula to find the derivative of log x (the inverse 
function to e”). 


6 Let p and q be positive integers. The Implicit Function 
Theorem implies that the curve y = f?, x = £ can be solved 
locally (except near the origin) to give y = A(x). 


(a) Show that for x > 0 there is a unique y > 0 such that 
(x,y) is on the curve. The function so defined is 
denoted y = x?/2, Note that it is defined only for 
x > 0. 

(b) For which values of p, q can x?’7 be defined in this 
way for all x? 

(c) By the Implicit Function Theorem x?” is differentiable 
for x > 0; find its derivative by implicit differentia- 
tion and express the result as a function of x. 

(d) The function |x|" is defined for all x provided r is a 
positive rational number r = p/q and, by the Chain 
Rule, it is differentiable except possibly at x = 0. 
Sketch the graph of |x|" for r = 4, 1, 3/2, 2, 2/3, 
1/1,000, 1,000. 

(e) Show that |x|" is differentiable if and only if r > 1. 

(f) Give a formula for the derivative of |x|" which is valid 
for all x. 


7 Directional derivatives. If a point in xyz-space moves 
according to parametric equations x = f(t), y = g(t), 
z = h(t) then the rate of change of F(x, y, z) with respect to 
t at time 7 is expressed by the Chain Rule in terms of 


(fi), (t), h(i) = location at time 7 


f'@) = a, g'(@) = b, h'(#) = c, the three components 
of the velocity at time 7 


and the partial derivatives of F. This is called the ‘directional 
derivative of F in the direction of the vector (a, b, c) at the 
point (fC), g), hE. 
(a) Find the directional derivative of F(x, y, z) = 3x? + 
2y2 + z in the direction of the vector (a, b, c) at the 
point (2, —3, 1). 
(b) Which vectors (a, b,c) point into the ellipsoid F = 
const., which point out of, and which are tangent to 
F = const. at (2, —3, 1)? 
(c) What is the general formula for the directional 
derivative of F(x, y, z) at (X, Y, Z) in the direction of 
the vector (a, b, c)? 


8 Tangent planes. Assuming that dF # 0 at (X, y, Z), by the 
Implicit Function Theorem, F = const. can be solved for one 


Chapter5 | Differential Calculus 


150 


of the variables as a function of the other two, hence F = 
const. is a two-dimensional surface near (x, y, Z). 


(a) Justify the statement that the plane defined by the 
shorthand equation dF = 0 (see Exercise 4) is tangent 
to the surface F = const. at (X, y, Z) by showing that 
the velocity vector f’(7) = a, g’(f) = b, h’'(f) = c, of 
any motion (f(t), g(t), A(t)) through (x,y, 5 = 
(/(f), g(t), A(@Z)) lying in the surface F = const. lies 
in the plane dF = 0. 

(b) Find the equation of the plane which is tangent to the 
ellipsoid of Exercise 7 at (2, —3, 1). 


9 Implicit differentiation. Let F(x, y,z) be a function of 
, , F 

three variables with = (x, y, Z) Æ 0. Then locally the equa- 
z 


tion F = const. can be solved to give z = f(x, y), i.e. f is 
determined by the condition 


F(x, y, f(x, y)) = FO, J, 2). 


(a) Describe the tangent plane to the graph of f in terms 
of its partial derivatives. 

(b) Since the graph of f is a surface F = const. the 
equation of its tangent plane can also be written 
dF = 0. Use the fact that two planes 


Aix + Bıy + Cız + Di = 0 
Aox + Boy + Coz + D2 = 0 


are identical if and only if A1:B1:Ci:Di = 
Aə:Bə:Cə: Də, i.e. if and only if their equations 
differ by a non-zero multiple, to express the partial 
derivatives of fin terms of those of F. 

(c) Find the same formula directly by implicit differentia- 
tion. 


10 Spherical coordinates. The map 


I 


r cos 0 cos ġ 
r sin 0 cos ġ 


Ii 


z=rsing 


defines spherical coordinates on xyz-space. [r = distance 
from origin, 9 = longitude, @ = latitude.] 


a(x, y, 2) 

d(r, 0,6) 

(b) At what points is the map non-singular of rank three? 
Are the remaining points singularities, or are they non- 
singular of a different rank ? 

(c) Give ranges for (r, 6,) such that all points of xyz- 


(a) Find 


5.3 | Proofs 151 


space except those on the half plane y = 0, x > Oare 
covered exactly once. 

(d) Using implicit differentiation, show that the deriva- 
tives of the inverse function are 


aay kd 
ðx x+ yz. ðy x2 + y2 ðz 
op Xz 9b | —yzZ , 96 Vx2 + y2 


— — LS LL —— — ne | O 


x WP EP Vet? Y Wy HNA d PHH 


Proofs 


A given point 
Q of the magni- 
fied domain 


/ 
goes to P+sQ` / 


then to 
f(P+sQ) LO 
and finally JN 


to the point 
M. p(Q) -1P +5) -HP) 


of the magnified range. 


[Use the combinations cos dx + sindy and 
—sin 0 dx + cos dy as in the text. The above 
formulas assume that r > 0 and cos ọ > 0.] 


5.3 


A good intuitive understanding of the meaning of 
differentiability of maps f : R” — R” can be achieved by 
imagining that the map is being examined under a 
microscope. 

The idea of a ‘microscope of power 1: s directed at the 
point P of R” can be described mathematically as a new 
coordinate system on R” in which P is the origin and in 
which the scale is changed in the ratio 1:s. Thus the new 
coordinates of a point P’ are the amounts by which P’ 
differs from P multiplied by the (large) factor 1/s (where 
s is small). In short, 


— P 
S 


o = Z 


gives the location of P’ as seen through a microscope of 
power 1:s directed at P. Conversely, the point which 
‘appears’ at Q under the microscope is the point 


= P + sQ. 


Now to examine a map f: R” — R” under a micro- 
scope naturally means to place both the range and the 
domain under microscopes and to examine the map of 
the magnified domain to the magnified range. Specifically, 
microscopes of power 1:s are directed at P and f(P). 
Then a point Q of the magnified domain corresponds to 


Chapter5 | Differential Calculus 


152 


a point P + sQ of the domain; its image is f(P + sQ), 
which appears at 


(1) M, p(Q) = AE T sQ) — AP) f(P) 


under the microscope directed at f(P). In short, the map 
M,,p defined by (1) can be thought of as ‘f at P under 
microscopes of power I|:s”. 

Intuitively a differentiable map is one which is nearly 
affine under sufficiently high-powered microscopes. If the 
Original map fis in fact affine, then its appearance under 
a microscope is extremely simple: If P = (X 1, Xo,..., 
Xn), if Q = (hy, ho,...,h,), and if f: R” — R” is the 
affine map 


n 
yı = Di agxy + bi 
jai 


then the ith coordinate of M, p(Q) is 


AP + 50) — FP) _ l (È a;j(%; + sh;) + b:) — (> Ayjxj + 6:)) 
é j=l 


j=1 
1 n 
=v È ai;(sh;) = dX diihi 


Hence the point (hı, ho,...,h,) of the magnified 
domain is carried to the point 


(È ashy, 2o ashi,- - - , 2) amjh;) 
of the magnified range, regardless of the ratio of mag- 
OV; 
nification 1:s. Here a;; = = and the formula can be 


Xj 


abbreviated by 


n ay; 
d i= dx; 
Y 2 dx; I 


where the dx; = h; give the location of the point in the 
magnified domain, and the dy; gives its location in the 
magnified range. It will be shown that a differentiable 
map 


(2) i= fX X2... Xn) (i=1,2,...,m) 


is one for which the location of the image of a point of 
the magnified range (dy;) can be expressed by a similar 
formula in terms of its location in the magnified domain 


5.3 | Proofs 


153 


(dxi) in the limit as s — 0. Specifically, if the map (2) is 
differentiable then 


(3) lim M,,p(Q) = (= Vp. So Om i) 
s—0 Xj . . 


j=1 

where Q = (hy, ho,...,h,) and where the partial 
or OV; 

derivatives Yi are evaluated at P = (X1, Xo,..., Xn). 


Xj 
This formula can be abbreviated: 


OV; 
(4) dyi = YS! dey. 


The formula (3), which is proved below, is the real mean- 
ing of the formula (4) by which pullbacks are defined. 


Proof of the Chain Rule 


The essence of the proof is to show that the equation (3) 
holds for differentiable maps. From this it is not difficult 
to prove the Chain Rule for 1-forms, from which the 
Chain Rule for k-forms follows algebraically as in §4.2. 

The proof of (3) is based on the Fundamental Theorem 
of Calculus 


b 
| F'( dt = F(b) — F(a). 


Going from P to P + sQ in n steps along n line segments 
parallel to coordinate axes— 


from (X1, XQ, 0005 Xn) 
to (xı + shy, Xos... Xn) 
to (Xı + shy, Xo + Sho, X3, . .. 5 Xn) 


to...to (Xı + shi, X2 + sho,...,Xn + Shy) 


—and writing the change in the value of f; along each 
such line segment as the integral of the corresponding 
partial derivative gives the formula 


SiP + sQ) — fi(P) = fii + shis... , Xn + shi) — fil%1, Xo, ..-5 Xn) 


5 n Z;-+sh; OV; _ 
( ) = 2, | | y i (X1 + shy, sey Xj—1 + Shj_1, t, Xj415 vy Xn) dt. 
j=l va 


OX; 


To say that the partial derivatives are continuous means 
that for very small s the integrands in these integrals are 


Chapter5 | Differential Calculus 


154 


practically constant, and, specifically, practically equal to 


OY; . . 
= (X1, X2,..., Xn). If the integrands actually did have 
J 


this constant value then the integrals would be 


Oy; Leas 
sh; = (X1, Xa... , X„) and, dividing by s, 


J 


n n 


(6) M. p(Q) ~ (= DE (hj... D n) 


jai OX; j=1 0%; 
where the approximation consists in replacing the 
. . Oy; , 
integrands in (5) by the constants x (X). The equation 
Xj 


(3) to be proved is the assertion that the error in the 
approximation (6) goes to zero as s — 0. Very briefly, 
this is true because the difference between two integrals 
is at most the maximum difference between the two 
integrands times the length of the path of integration; 
this is an arbitrarily small number times 


s(lhi] + [hal] + °° + lhal); 


hence the error divided by s is arbitrarily small and (3) 
follows. 

In detail: Given a map (2), a point P = (X1, X2,..., 
Xn), a bound B (to be thought of as the size of the eye- 
piece of the microscope magnifying the domain), and a 
margin for error e, there is a degree of magnification ô 
such that the error in the approximation (6) is less 
than € in each of the m coordinates whenever |s| < 6 and 
lh;| < B (i = 1,2,...,n). That is 


F(X + sh) — fi%) _ > Yi (x)h;| <e 
S ja OX; 

OV: 
Xj 
is a continuous function there is a 6;; such that its value 
at (X1,X9,..-,Xn) differs by less than e/nB from its 
value at (X1, X2,.-.-, Xn) Whenever |x; — Xil < 64;,..., 
Xn — Xn| < 6;;. Let such 6;; be chosen for each i and j, 
and let 69 be the smallest of these mn numbers. Finally, 
let ô = 5,/B so that the path of integration in (5) lies 


for i = 1, 2,...,m. The proof is as follows: Since 


OY: . , 
entirely inside the region where - is known to differ by 
X . 


Jj 
less than €/nB from its value at P, provided |s| < ô. 


5.3 | Proofs 


155 


Oy; 
Changing the integrands in (5) to = 


(z) then introduces 
Xj 
an error of at most (€/nB) X (nsB) = es, so the error in 
(6) is at most €, as was to be shown. 

Now let differentiable maps f : R” — R”, g : R” — R?” 
be given as in the statement of the Chain Rule in §5.2. 
The notation of the proof of the Chain Rule is inevitably 
somewhat cumbersome, but the idea of the proof is 
essentially contained in (6) The composed map 
go f: R” — R? under a microscope is 


(7) slf(P + sQ) — sff(P)] , 
S 


which can be written 


S 
where 


Q' = JP + sQ) — f(P) | 
s 


When P is fixed and s is very small, Q’ is very nearly 


(È ayjhj,..., 25 anh) where Q = (hy, ha, ..., hn) 
j=l j=l 


OV; ; 
and aij = x (P), so that (7) is very nearly 
Xj | 
(È > biyAyjhj,...; >D > byt sh) where b,; is 
v=] j=l v=1 j=l 
ðZ; 


the value of at f(P). As s — 0 all approximations 


ay; 
improve and the formula becomes 


dz; = 5 ( bisan) ax; 
j=l v== 1 
which is the Chain Rule. 
To fill in the details so that this becomes a proof of the 
Chain Rule, note first that it will suffice to prove that the 


. o. ðZ; . . 
partial derivatives — of the composed function exist 
Xj 
and are given by the formula 


02; OZ; ð y 
(8) Si n p D 


since this implies that the composed function is dif- 
ferentiable (sums and products of continuous functions 


Chapter5 | Differential Calculus 


filgil), . 


156 


are continuous) and that the pullback of dz; under the 
composed map is the pullback of the pullback. The same 
is then true of the pullback of any l-form } A; dz; 
under the composed map; hence, by the argument of 
§4.2, the same is also true of the pullback of any k-form. 
Thus the formula to be proved is (8), which is the asser- 
tion that if Q is 0 except for a 1 in the jth place then the 
limit of the ith component of (7) as s — 0 exists and is 


m 
equal to 5 b;,a,; where the a’s and b’s are defined as 
y=] 


above. 

To simplify notation, the case i = j = 1 will be con- 
sidered, the other cases being exactly the same. Then 
O = (1,0,0,...,0) and it was proved above that 
Q’ = [f(P + sQ) — f(P)]/s differs from (aii, Alz... 
amı) by less than any prescribed e once s is sufficiently 
small. Let h; (v = 1,2,...,m) denote the actual co- 
ordinates of Q’, and let B be a number larger than 
max(|@11|, |a21|, - - -, |am1|) so that |h,| < B whenever 
s is sufficiently large. It was shown above that then the 
first coordinate of (7) is within any prescribed e of 


Š bishi once s is sufficiently small. But A) differs from 
v=] 
a,; by less than any prescribed e once s is small, hence 


& bh, differs from 2 bi,a,, by less than È bwel. 


Thus the first coordinate of (7) differs from È birayı by 

less than € È [bis| + € for arbitrarily small € once s is 

sufficiently ‘small Therefore lim of the first coordinate 
s—0 


of (7) exists and is È bı,a,ı as was to be shown. This 


completes the proof of the Chain Rule. 


Implicit Differentiation 
Let fi, g:, h; be differentiable functions satisfying the 
relations of the Implicit Function Theorem and let 


08: 08: ðh; 
ij 5 77?’ Aij = Sy Bi = 5x.” w= at? 


all derivatives being evaluated at (X1, X2,- -s Xn, Vi; 


Yo, .. ., Ym). The relations satisfied by f;, Zi, hi can be 


stated: 
(u) j= Ui G=1,2,...,7) 
ro BAUD Urti ->o Un? = Vy yy (ij=rt+il,...,m) 


5.3 | Proofs 


157 


(7) 


for all (u1, u2, ..., Un) near (Vy, ~~~ 5 Prs Xray ©- -s Xn): 
Differentiating both sides of this identity with respect to 
each of the variables u; using the chain rule on the left 
gives the matrix equation 


r nr 
ti r n—r a S 
a a t 
I: O|ftr 
A B |r A 


relating the a’s, A’s, B’s, and C’s. This means that the 
A’s, B’s, C’s give the solution of the equations 


dy; = Di ay; dx; (i= 1,2,...,m) 
j=l 
as was to be shown. 


Implicit Function Theorem 


The proof that (2) implies (i) and (ii) is deduced from the 
Chain Rule exactly as in the affine case. The functions 


Yi = fil 1, -<s Xrs Xrp1s + -s Xn) (i= 1,2,...,r) 


and 


Xi = gilYi, -s Yrs Xr41s <- -s Xn) (i= 1,2,...,F) 
are by assumption inverse to each other, consequently 


Ii +++ Yr) Ir -ea Xr) _ 1 
(xı, . e -s Xr) 0()1, - ° -s Yr) 


and (1) follows. On the other hand, the functions h; can be 
used to write (1) as a composed map (x1, X2,...,;Xn)— 
(V1, Y2- + > Vr) > (V1; Y2- Vm) and (ii) follows 
from the Chain Rule and the fact that an (r + 1)-form 
in r variables is necessarily zero. 

To show that (i) and (ii) imply (2) one uses a process 
of step-by-step elimination exactly as in the affine case, 
first rearranging the first r of the x’s so that the stronger 
condition 


an Ôi Vay ++ -> Ve) B 
O) aa x) ~ 9 (k = 1,2,...,7) 


is Satisfied. However, the elimination of one variable 
from one equation cannot be accomplished by subtract- 


Chapter5 | Differential Calculus 


158 


ing and dividing as it was in the affine case and one must 
appeal to the following theorem. 


Elimination Theorem 


Given an equation of the form 


(9) y = f(%1, Xa,-- +5 Xn) 


in which fis a differentiable function, and given a point 


= f(%1, X2,---5Xn) 


<l 


0 

at which aa Æ 0, there exist a number e > O and a 
Xı 

differentiable function g(y, X2, X3, .. ., Xn) such that the 


equation 


(10) Xi = g(y, X2, X3, .. -, Xn) 


is equivalent to (9) at all points within e of (F, X1, Xa, 
...5 Xn); that is, f and g are both defined at all points 
(Y, X1, X2, - - . , Xn) Satisfying |y — F| < €, [xy — Xıl < 
€, ..., |X, — Xn| < € and the equation (9) holds at such 
a point if and only if (10) does. 

Using the Elimination Theorem to eliminate variables 
one-by-one (reducing e whenever necessary) the condition 
(i’) guarantees, exactly as in the affine case, that the next 
equation can always be solved for the next unknown until 
the equations have the form 


Xi = gii, Y2- +- s Yrs Xrpiy+ ++ 5 Xn) 
(i= 1,2,...,7” 

yi = AV 15 Vos ++ <, Yrs Xr41s +++ Xn) 
Gi=rt+,...,m). 


The condition (ii) is now to be used to show that the 
functions h; are independent of the x's, i.e. that 
ðh; 
Ox; 
dhi _ IVi, Vz,- - -Yrs Yi) 
OX; OVi Y2,- -s Yrs Xj) 


=0(i=r+1,...,m;j=r+1,...,n) But 


and, by the Chain Rule, 


0= Yi Yas +» + s Yrs Yi) 
(Xi, XQ, +225 Xrs x;) 


_ Ohi Yi Vay s+ +s Yr) 
OX; (Xi, X2,.--5 Xr) 


5.3 | Proofs 159 


Exercise 


t 


SO ax, 0 by (1) and (i1) exactly as before. 


J 
Thus the proof of the Implicit Function Theorem is 


reduced to the proof of the Elimination Theorem (which 
is one half of the special case m = r = 1 of the Implicit 
Function Theorem). The proof of the Elimination 
Theorem by the method of successive approximations is 
given in §7.1. (For an alternative proof see the exercise 
below.) 


1 Fill in the details of the following proof of the Elimination 

Theorem: Let a = = (X1,X2,...,Xn) and let F(xı, x2, 
1 *1 

. Xn) = 7 [f(x1, X2, ..., Xn) — f(0, x2, x3,...,Xn)]. If 


the equation z = F(x1, x2,..., Xn) can be solved locally to 
give xı = G(z,x2,x3,...,Xn) then the equation y = 
f(%1, X2,..., Xn) can be solved locally by solving F(x1, x2, 


wey Xn) = 1 [y — f(0, x2,..., Xn)], to obtain 
a 


1 
x1 = o(tv - ETES ESES = gY, X2, X3, ©. ©, Xn) 


*A continuous function f(x) on an 
interval {a < x < b} assumes all 
values between f(a) and f(b). 


where g is defined by this equation. Conclude that one can 
assume at the outset that the function f(x1, x2,..., Xn) of 
the Elimination Theorem satisfies the additional conditions 


©” (%1, Ea... Xn) = 1 
1 


S(O, x2, X3,...5Xn) =O. 
Let o > 0 bea small number (any number less than 1 will do) 


and let 6 > 0 be such that = is within o of 1 whenever 
1 
(X1, X2,...,Xn) is within 6 of (X1, X2,...,Xn), Le. 
xi — X| < ô (i = 1,2,..., n). For all (x2, x3,..., Xn) 
within 6 of (X2, ¥3,..., Xn) the function f (x1, x2,..., Xn) IS 
increasing for xı in the interval {¥ı — ô < xı < ¥1 + ô}. 
At the left end of this interval its value is at most — (1 — o)6é 
and at the right end the value is at least (1 — o)6. [Use the 
Fundamental Theorem.] By the Intermediate Value Theorem* 
conclude that for all (y, x2,.x3,...,Xn) such that |y| < 
(1 — o)6 and |x; — x,| < 6 there is a unique xı in the 
interval |x; — X1| < ô such that y = f(x1, x2,..., Xn). 
Denote this uniquely determined value by g(y, x2, X3,...5 Xn). 


Chapter5 | Differential Calculus 


y =f(x) 


160 


Set «e = (1 — o)6 and show that the conclusions of the 
Elimination Theorem are satisfied, except that it remains to 
Show that g(y, x2,x3,...,Xn) is differentiable. Given 
(Y, X2, X3,...,Xn) within e of (0, Xo, ¥3,..., Xn) and given 
(hi, ho,..., hn) the quantity 


Axı _ gy + shi, x2 + sho, . . . , Xn + Shn) — BY, X2, . - + Xn) 


Application : 
Lagrange Multipliers 


S 
can be estimated by setting 
xi = g(y + shı, x2 + sh2, .. ., Xn + Shn) 
xi = gy, XIQ,X38, +205 Xn) 
so that 
y + shy = f(x}, x2 + Sho,...,Xn + Shn) 
y = f(x", X2, X83)... 5 Xn) 


and using the estimate 


dy y 1” dy eee oy 
shi ~ ax: (xı — xi) + Jaz sho + + ax, Shn 
Ax1 oy -1 Oy oy 
so (2 (i 0x2 hz OXn hm 
Oy. 
where is evaluated at (g(y, x2,..., Xn), X2, X3, . . ., Xn). 
Xi 


Show that the error in this estimate goes to zero as s — 0. 
Conclude that g is continuous, that its first partial derivatives 
exist, and that they are continuous functions of (y, x2, x3, 

.,Xn). [The composition of continuous functions is con- 
tinuous.] Thus g is differentiable and the theorem is proved. 


5.4 


One of the most direct and useful applications of dif- 
ferential calculus is to the problem of finding maxima 
and minima. It is based on this observation: Jf a dif- 
ferentiable function of one variable y = f(x) assumes a 
maximum or a minimum value at a point X inside its 
domain then dy = f'(x) dx must be zero at that point, i.e. 
f'(X) = 0. This theorem can be deduced from the 
Implicit Function Theorem as follows: If dy ¥ 0 at X 
then the equation y = f(x) can be solved for x = g(y) 
at all points y sufficiently near P = f(X). Setting x = 
g(y + e€) then gives points where f(x) = F + e, which 


5.4 | Application: Lagrange Multipliers 161 


y =f (x) 


*/t is not difficult to show that there 
exists a point where y is a maximum 
(see Chapter 9) but this is not the 
question. The question is: Assuming 
there is a maximum, how can it be 
found ? 


shows that F is neither a maximum nor a minimum value 
of y = f(x). Thus, if F is a maximum or a minimum, 
then dy = 0. 

The usefulness of this theorem lies in the fact that the 
equation f’(x) = 0 will normally have only a few solu- 
tions x and, therefore, that the set where y = f(x) could 
possibly assume a maximum or a minimum value is 
reduced to a few points. One can then evaluate the func- 
tion at each of these points and know that the largest of 
them is the maximum, and the smallest the minimum, 
among all values of y = f(x). The steps in the solution 
are: 


I. Find the equation f'(x) = O which must be satisfied 
at maxima and minima. 

II. Find all solutions of the equation f’(x) = 0. 

III. Test each solution to determine whether or not it 
is a maximum (or minimum). 


Step I of this program is the application of differential 
calculus to the problem, and it is Step I which is general- 
ized to functions of several variables by the method of 
Lagrange multipliers. 

Note that if one is looking for the maximum of 
y = f(x) on an interval {a < x < b} then the condition 
f'(x) = 0 need not be satisfied at the maximum if the 
maximum occurs at an end point x = aor x = b. This 
is not a serious problem since it means only that these 
two points cannot be excluded by the condition f’(x) = 0, 
and that the number of points which require further 
investigation at Step III is increased by two. However, in 
maximizing a function of more than one variable over a 
domain with boundary, the fact that the condition dy = 0 
need not be satisfied if the maximum occurs on the 
boundary means that there remains an infinite set of 
points which are not excluded by the condition dy = 0. 
This means that further conditions must be found which 
will exclude as many of these boundary points as 
possible. 

Consider for example the problem of finding a* point 
where a differentiable function y = f(u,v) of two 
variables assumes a maximum value on the disk 
{u? + v? < 1}. If a maximum occurs at an interior 
point (ū, 0) of the disk, then dy = 0 at (u, D) by the same 
argument as before: If dy #0 then the equation 
y = f(u, v) can be solved u = g(y, v) or v = g(u, y) by 


Chapter5 | Differential Calculus 


162 


the Implicit Function Theorem; then u = g(} + €, v) or 

= g(u, + €) gives points (u,v) where f(u,v) > 
f(a, 6); hence if f(@,5) is a maximum the assumption 
dy ~ 0 must be false. Thus the equations 


oy oy 

(1) a 0, a 7 0 
must be satisfied at any maximum which occurs inside the 
disk. The equations (1) give two equations in two un- 
knowns (u, v) and normally will exclude most points of 
the disk as possible maxima. However, as before, the 
equations (1) need not be satisfied at a maximum if the 
maximum occurs on the bounding circle {u? + v? = 1} 
of the disk. (The argument that dy = 0 at a maximum 
fails because the solutions (u, v) of P + € = f(u, v) may 
all lie outside the disk and F may still be the largest value 
assumed by f(u, v) for {u? + v? < 1}.) 

In order to find conditions satisfied by a point on the 
bounding circle {u? + v? = 1} at which a maximum 
occurs (Step I), consider the map R? — R? defined by 


X = u? + v? 
y = fu, v). 
If dX dy # 0 at (ū, 0) these equations can be solved 
u = 2i(X, y) 
v= g(x, y) 


for all (X, y) near (ï? + 5°, f(a, D)). If (n, D) lies on the 
circle X = 1 then F = f(ū, 0) is not a maximum on the 
disk because 


u = gı(l, y + €) 
v= g(1, 7 + €) 


gives a point of the disk (in fact a point of the circle 
X = 1) at which f(u, v) = J + € > f(u, 0). Therefore 
if p = f(u, 0) is a maximum then dX dy must be zero at 
(u, ©). This is the desired condition. 


Example 


Find the maximum value of y = 8u + 3v on the disk 
{u? + v? < 1}. Here dy = 8 du + 3 dv is never zero so 
there can be no maximum in the interior of the disk. The 


5.4 | Application: Lagrange Multipliers 163 


maximum must therefore occur at a point on the circle 
{u? + v? = 1} at which 
dX dy = 0 
(2u du + 2v dv)(8 du + 3 dv) = 0 
(6u — 16v) dudv = 0 


v = łu 
u” + Qu)” = 1 
8 3 
u = + —» V = + —— 
/73 /73 


y = 8u + 3v = VND. 


Thus there are only two points at which a maximum 
could occur. Clearly the maximum is v 73 at (8/\/73, 
3/4/73) and the minimum is —v73 at (—8/V/73, 
— 3/4/73). (Strictly speaking, it has been proved that 
if there is a maximum then it must be 1/73. The fact that 
there must be a maximum is proved in §9.4.) 

A similar technique applies to the problem of finding 
maxima and minima of functions y = f(u,v) on other 
domains of the uv-plane. Consider, for example, the 
domain consisting of all points which lie under the 
parabola u? + v = landabove the u-axis. Ifa maximum 
occurs at a point (u, D) inside the domain then dy = Oat 
(u, 0). However, this condition need not be satisfied at a 
maximum which occurs on the boundary. If a maximum 
occurs at a point inside the line segment {—1 < u < 1, 
v = 0} then the Implicit Function Theorem applied to 
the equations 


X=0UD 
y = flu, v) 
shows as before that dX dy = 0 at the point where the 
0 ð 
maximum occurs, 1.e. (2 du + = do) dv = 0 or simply 
P u 
= = Qat the maximum. If a maximum occurs at a point 
u 
inside the parabolic arc {u? + v = 1,—1 < u< 1 
then the Implicit Function Theorem applied to 
X =u? +v 
y = f(u, v) 


implies that dX dy = 0 at the maximum. However, a 


Chapter5 | Differential Calculus 


164 


maximum value can also occur at either of the points 
(+1, 0) without any further condition being satisfied. 


Example 


Find the point of the above domain which is farthest 
from the point (4,—1). To maximize the distance 
V/(u — 4)2 + (v + 1)? is the same as to maximize the 
square of the distance 


y= u- + @4 1)’. 
Since 
dy = 2(u — 4)du + 20+ 1) dv 


is zero only at (u,v) = (4, — 1), there is no maximum 
inside the domain. A maximum inside the line segment 


ð 
{—1 <u < 1,v = 0} could occur only where = = 0, 
u 


i.e. only at (4,0), but this is clearly the minimum. A 
maximum inside the parabolic arc could occur only where 


(2u du + dv)(2(u — 3) du + 2(v + 1) dv) = 0 


(4u(v + 1) — 2(u — 4)) du dv = 0. 
Using u? + v = 1 this gives 
4u(2 — u*) — 2(u— 4) = 0 
8u? — 12u— 1 = 0. 
Evaluating this polynomial at u = —2, —1, 0, 1, and 2 


shows that it has a root between —2 and —1, another 
between —1 and 0, and a third between 1 and 2. Thus 
only one root lies in the range being considered, and if the 
farthest point lies inside the parabolic arc then it must be 
the point (u, 1 — 7”) where @ is the root of 8u? — 12u — 1 
=0 between —1 and 0. Finally, the farthest point 
could be one of the points (+1, 0). However, it is easily 
verified that (—4, 2) is farther from (4, — 1) than either 
of these points. Therefore the farthest point is the point 
(i, 1 — ui”) above. (Strictly speaking, it has been shown 
that if there is a farthest point then it is (a, 1 — a’). The 
geometrically evident fact that there must be a farthest 
point is proved in §9.4.) 

Many physics problems involve finding maxima and 
minima subject to constraints. For example, if a particle 


5.4 | Application: Lagrange Multipliers 165 


4u* +v? =4 


3u+2v=25 


which is constrained to move along the parabola 
u? + v = 1 is being attracted toward the point (2, — 1), 
it will eventually come to rest at one of the two points of 
the parabola which are (locally) nearest to (4, — 1), that 
is, at one of the points (7, 1 — u^) where 7 is the root of 
8u’ — 12u — 1 = 0 between —2 and —1 or the root 
between 1 and 2. (The third root corresponds to a point 
of unstable equilibrium and the particle could con- 
ceivably stop there as well.) In such problems there are 
often many variables and many constraints. As a simple 
example consider the following: A particle constrained 
to lie on the ellipse 4u? + v? = 4 attracts a particle 
constrained to lie on the line 3u + 2v = 25. At what 
position do they come to rest? Let (x, y) denote the 
coordinates of the particle on the line and let (u,v) 
denote the coordinates of the particle on the ellipse. Then 
the position of the ‘system’ is described by the four 
numbers (u, v, x, y) subject to the two constraints 


4u? +v? = 4 


(2) 3x + 2y = 25 


which leave two degrees of freedom. At a point of 
equilibrium the function 


y = (u — x) + v — y)’ 


must have a local minimum subject to the constraints 
(2). Applying the Implicit Function Theorem to the map 
Rt — Rê defined by 


U = 4u? + v? 
V = 3x + 2y 
Y = (u— xX’ + U- y)’ 


implies, by the same argument as before, that equilibrium 
can occur only at points where dU dV dY = 0. Therefore 
the problem is reduced to the algebraic problem of finding 
all points at which dU dV dY = 0 which also satisfy the 
constraints (2). 

This algebraic problem is, however, considerably more 
difficult than the algebraic problems which had to be 
solved in the simpler examples above. The straightfor- 
ward method of solution would be to write dU dV dY in 
terms of (u, v, x, y), to set the 4 coefficients of this 3-form 
equal to zero, and to try to find solutions (u, v, x, y) of 
these four equations which also satisfy the two equations 
(2). Although this can be done and a solution can be 


Chapter5 | Differential Calculus 


*) = lambda. 


166 


found by this method, there is another method, called 
the method of Lagrange multipliers, which is much 
simpler. 

The method of Lagrange multipliers is to observe that 
at any equilibrium point there must be numbers* )j, Ao 
(the multipliers) such that 


(3) dY = i dU + ^o dV. 
To prove this, write 


dU = u du + 2v dv 
dV = 3 dx + 2 dy 


oY oY oY 
dY = =; du + > dv + ~~ 


oY 

dx + oy dy. 
At any point (u, v, x, y) satisfying the constraints (2) the 
first two of these equations can be solved for two of the 
1-forms du, dv, dx, dy in terms of dU, dV and the remain- 
ing two l-forms. (If u = 0 then du, dx can be expressed 
in terms of dU, dV, dv, dy. If u = O then by (2) it follows 
that v + 0, so dv, dx can be expressed in terms of dU, 
dV, du, dy.) Substituting these solutions into the third 
equation expresses dY in terms of dU, dV, and the 
remaining two of the l-forms du, dv, dx, dy. If this 
expression for dY is not of the form (3) then the elimina- 
tion process can be carried one step further to express 
three of the I-forms du, dv, dx, dy in terms of dU, dV, dY 
and the remaining one. But this is possible only if 
dU dV dY # 0, so it is not possible at a maximum, and 
at a maximum (3) must hold. 

Using the equation (3), the possible solutions are easily 
found. Writing out the du, dv, dx, dy components of (3), 
it becomes four equations 


2(u — x) = ĝu 2(x — u) = do3 


(3) 2(v — y) = 12v 2(y — v) = do. 


These equations, together with (2), give 6 equations in the 
6 unknowns (u,v, x, Y, 41, 42). Equating the two ex- 
pressions for 2(u — x) and the two for 2(v — y) gives 


Sur, = —3)a, 2vA, = — 2o. 
Equating two expressions for — 6). then gives 
16ud = 6vU) 1. 


Now à; + 0 because if \; were zero then (3’) would give 


5.4 | Application: Lagrange Multipliers 167 
u = X,v = y which is incompatible with (2). Therefore 
division by 6), is possible and gives 
Su =v. 
Using this in (2) gives 


4u? + $4? = 4 


2 4,9 _3° 
u = 406 = 32 
u = +ł, v = +8. 


Thus equilibrium can occur only when the particle on the 
ellipse is at one of the two positions (u, v) = (3/5, 8/5) 
or (u,v) = (—3/5, — 8/5). If it is at (3/5, 8/5) then the 
position (x, y) of the particle on the line must satisfy 


4(x — u) = 6d2 = 6(y — v), 
4x — 6y = 4u — 60 = — 348, 
2x — 3y = —42. 


Together with 3x + 2y = 25 this gives (x, y) = (339/65, 
304/65). In the same way the position (u,v) = (— 3/5, 
— 8/5) determines a unique position (x, y). It is clear 
geometrically that this second equilibrium position 1s 
unstable, and that the unique position of stable equilib- 
rium is (3/5, 8/5, 339/65, 304/65) which solves the given 
problem. (Strictly speaking, it has been assumed that 
there is a position of stable equilibrium and it has been 
proved that no position other than the two found above 
can be a position of equilibrium.) 

The method of Lagrange multipliers can also be used 
in the first two examples, even though they were simple 
enough to be done without it. For example, in the first 
problem 


l 
max. 


u? + v? 
8u + 3v 


the condition dX dy = 0 can hold at a point only if 
dy = \ dX (for some multiplier \) at this point; this 
follows from the observation that 


dX = 2udu + 2v dv 
dy = 8du+ 3 dv 


can always be solved for one of the 1-forms du, dv in 
terms of dX and the other (u and v cannot both be zero 


Chapter5 | Differential Calculus 


*Many more general maximization 
problems decompose into several 
problems of this type—for example 
the problem of maximizing f(u, v) on 
{(uuv)iv>O0u2+v< 7 
considered above. 


168 


on u? + v? = 1) but that the elimination can be carried 
no further at a maximum (because dX dy = 0 at a 
maximum). Hence dy = \ dX at a maximum. Equating 
the coefficients of du and dv gives two equations 


8 = \: 2u 
3 =X: 2w 
in addition to 
u? +o = 1. 


Eliminating \ gives v = łu and the remainder of the 
solution is as before. 

These techniques can be formulated in general terms 
as follows: Suppose that a differentiable function 


y = f(%1, X2,-- +5 Xn) 


is to be maximized subject to k constraints X; = const. 
(i = 1,2,...,k) where 


Xi = 2i(X1, X2,---5 Xn) (i= 1,2,...,k) 


are differentiable functions.* This problem will be 
abbreviated 


| 
Q 
© 
=) 
yn 
ct 
T 
| 
pà 
N 
Pon 
Ne 


(4) 2 

y 
At any point (x1, X2,..., Xn) where a maximum value 
is achieved, the (k + 1)-form dX, dX_...dX;, dy must 
be zero. (Otherwise, by the Implicit Function Theorem, 
the equations X; = const., y = Y + e can be solved to 
give an (x1, X9,..., Xn) where y is larger.) The con- 
straints X; = const. are said to be non-singular of rank 
k at a point P = (x1, X2, . . . , Xn) which satisfies them 
if dX, dX_....dX, Æ 0 at P, that is, if X: = g;(%4, X2, 
.. +5 Xn) defines a map R” —> R* which is non-singular of 
rank k at P. The constraints X; = const. are said to be 
non-singular of rank k if this holds at all points P which 
satisfy them. Thus the constraint u? + v? = 1 is non- 
singular of rank 1, and the constraints (2) are non- 
singular of rank 2. When the constraints are non-singular 
of rank k the problem of finding points at which 
dX, dX»,...dX;, dy = 0 is simplified by the method of 
Lagrange multipliers. 


5.4 | Application: Lagrange Multipliers 169 


Method of Lagrange Multipliers 


If P = (x1, X9,..., Xn) is a solution of the problem 


X; = const. (i= 1,2,...,k) 
y = max. 


(4) 


and if the constraints X; = const. are non-singular of 
rank k at P then there exist numbers ^1, Ao,..., Ax (the 
multipliers) such that 


(5) dy = dy dX; + dodXo+°'' +r, dX, 
at P. The same is true if P is a solution of the problem 


X; = const. (i= 1,2,...,k) 
y = min. 


at which the constraints are non-singular of rank k. 


Proof 
Since dX, dX_,...dX;, ~ 0 at P by assumption, the 


equations 


dX; = S ĉi dx; (i= 1,2,...,k) 
jai OX; 


can be solved for k of the 1-forms dxi, dxo,..., dXn. 
By rearranging the x’s if necessary, it can be assumed that 
dx 1, dX2,..., dx, can be expressed in terms of dX, dXo, 
-s AX, dXk41, - - - , dXn. Substituting these expressions 


. A OV; , 
ind = >> Y: dx; gives 
j=l Ox; 


dy = Mı dX, +t + dp dXr + Nea dXkpi $7 An dXp. 


If this expression is not of the form (5) then another of 
the dx’s can be eliminated and the (k + 1)-form 
dX,dX2...dX;, dy is not zero at P. But then, by the 
previous argument, P is not a maximum or a minimum. 
Hence if P is a maximum or a minimum then (5) holds, 
as was to be shown. 


When the n components of the 1-forms in (5) are 
written out, (5) gives n equations in addition to the k 
equations X; = const., hence there are n + k equations 


Chapter5 | Differential Calculus 


*/f k = O, that is, if there are no 
constraints, then (5) becomes 

dy = 0./fn = 7 this Is the case 
considered at the beginning of the 
section. 


+Picturesquely speaking, the 

equation Ay = M AX, +--+ + ^k OX, 
implies ‘If dX; = 0,..., AX, = O 
then dy = 0”, that is, ‘no change in 
the X; implies no change 
(infinitesimally) in y’. For this reason, 
critical points are also called 
‘stationary points’ of y relative to the 
constraints X; = const. 


170 


in the n + k unknowns (x1, X9,.~.- Xn, X1; Nas ++ -3 Ak) 


Explicitly, 
ay yy OX g 
Ox; ~~ 2 J Ox; (i T l, 2, . 9 n) 
X; = const. (i = 1,2,...,k). 


This generalizes Step I of the process above to problems 
of the form (4) in which the constraints are non-singular 
of rank k.* Step II is to solve these n + k equations in 
n + k unknowns, and Step II is to test each solution to 
determine whether or not it is a maximum. 

A point (x1, X9,..., Xn) is Said to be a critical point 
of the problem (4) if it satisfies the constraints and if 
there exist numbers \ 1, Ao, . . . , Ag Such that (5) holdst at 
(Xis X2,.--,5 Xn). The steps are then: 


I. Set up the equations satisfied by the critical points. 
II. Find all critical points. 

II. Look for a maximum (or minimum) among the 
critical points. 


Step II involves solving n + k equations in n + k 
unknowns and is therefore not feasible in most cases. 
However, there are many cases in which these equations 
have a particularly simple form and can either be solved 
explicitly or can be used to derive useful information 
about the critical points. The remainder of this section is 
devoted to a few such cases which are particularly useful 
in applications. 


Example 


Find the minimum and maximum values of Au + Bv on 
the disk u? + v? < 1. (The case A = 8, B = 3 was 
solved above.) There are no critical points in the interior 
of the disk unless A = B = 0, that is, unless the function 
Au + Bo is identically zero and 0 is both maximum and 
minimum. Otherwise the only critical points are on the 
bounding circle u? + v? = 1. The constraint u° + 
v? = 1 is non-singular of rank 1 and the method of 
Lagrange multipliers applies, giving 


A du + B dv = \(Qu du + 2v dv). 


Hence 


+ wa 
bo 
> 
G 


5.4 | Application: Lagrange Multipliers 171 


If these equations are satisfied then 
Au + Bv = Du? + 2dv? = 2). 
so the maximum and minimum values are 2). But 


A? + B® = (M 


2 = +V A2 + B?. 


If 2\ has either of these two values then u = A/2), 
v = B/2d are determined; hence there are exactly two 
critical points and they occur when u:v = A:B. The 
desired maximum and minimum values are therefore 
+1/A?-+ B2. Note that this formula holds when 
A = B = Qas well. 


hence 


Example 


Find the minimum and maximum values of A,x; + 
Aoxg +++: + AnXn On the n-dimensional ball x? + 
x2 +e + x? < K where Ay, Ao,...,An, K are 
fixed numbers with K > 0. As above, if not all A; are 
zero then the only critical points are on the (n — 1)- 


dimensional bounding sphere {x? + x$ +: + 
x2 = K}. The constraint x? + +: + xZ = K is non- 
singular of rank 1 because d(x? + +- + x?) is zero 


only at (0,0,...,0), which does not satisfy the con- 
straint. By the method of Lagrange multipliers the 
maximum and minimum must therefore occur at points 
which satisfy 


A; = 2dx; (i = 1,2,..., A) 
xi txt e taK 


for some X. Since 
Aixi + Aoxg fete + AnXn = 2x? = 21K 
and 
Ai + AZ + -+:+ A2 = (2d)?-K 


there are only two possible values, 


VAT H+ + An 


`= + — 

2V K 
of A, each of which corresponds to a unique critical point 
x, = 4 (G=1,2,...,n) 


Chapter5 | Differential Calculus 


172 


at which the value of the function is 


NK = VA +--+ VK 
= AVA H HAVA H $ x2. 
It follows that these are the maximum and minimum 


values and that each is assumed at exactly one point. 
These conclusions can also be stated: The inequality 


(6) [Aixi Hie H Antal L VA Ho + ARV Ph to 


*u = mu. 


TA ‘line segment’ which degenerates 
to a single point is considered to be 
collinear with any line segment. 


holds for all pairs of n-tuples of numbers (41, ..., An) 
and (x1,..., Xn). Moreover, equality holds only if 


xite--+tx?=0, ie.x,=Ofori=1,2,...,n 
or 
A?4t---+A47=0, ie. A; =0,fori=1,2,...,n 
or there is a non-zero number p* (=2)) such that 

Aj = HXi (i= 1,2,..., n). 


The inequality (6) is called the Schwarz inequality. The 
conditions under which equality can hold can be stated 
more symmetrically: There exist numbers u1, ue not 
both zero such that 


by A; = M2Xi (i = I, 2, se ,i1). 


This includes all three of the cases 4;=0, x;=0O, 
A; = MX. 

Geometrically V x? +: + x? can be regarded 
as the length of the line segment from the origin to 
(X1, X23... Xn) and ./A?+---+ A? can be re- 


_ garded as the length of the line segment from the origin 


to (A, Ao,..., An). The Schwarz inequality then says 
that the absolute value of the so-called ‘dot product’ 
AyXy + ++ + ÁnXn is at most the product of the 
lengths and that equality holds if and only if the line 
segments are collinear.t 


Example 


Find the minimum and maximum values of Ax + By on 
the curve |x| + |y|® = K where 4, B, K are fixed and 
K > 0. It was shown in Exercise 6, §5.2, that |x|? is a 


5.4 | Application: Lagrange Multipliers 173 


y 


IxP+lyP=K 


differentiable function with derivative 3x|x|. Since the 
1-form d(|x|? + ly|°) = 3x|x| dx + 3yly| dy is not zero 
at any point satisfying the constraint |x|? + |y|? = 
K > 0, the method of Lagrange multipliers applies and 
says that if a maximum or minimum value is assumed at 
(x, y) then there must be a à such that 


A dx + Bdy = X(3x\|x| dx + 3yly| dy). 


The critical points are therefore defined by the equations 


A = 3hx\|x| 
B = 3)y\yI 
|x|? + ly? = K. 


Since 
Ax + By = 3d\x|? + 3dly|° = 3AK 


the problem is to find 3\K. If the equations |A| = 
3A] [x] ?, |B] = [3A] |y|? are raised to the power 3/2 and 
added, the result is 


|A|3/? + |B|3? — 13A]? 2K 
(|A|?/? + |B|5 AK"? — 3A K| 2! 


Hence, raising to the power 2/3 gives 
(7) [BAK| = (A|? ? + [B3 dx + y. 


If A = B = Othend = 0, and the function is identically 
zero on |x|* + |y|3 = K. Otherwise (7) gives two non- 
zero values for A, and hence two critical points (x, y)— 
one at which the value of Ax + By is 


(141?! + [B33 + yl" 


and one at which the value is minus this number. These 
are clearly the maximum and minimum values re- 
spectively. If (x, y) is a critical point, then 


|A| = [3A] |x|? |B| = [3A] |yl? 
|a|"? = |3d]/? |x| JBI"? = |3a\"/?|y| 
AJA]? = px|x|? B|B|"? = pyly|? 


where u = 3d(3A|"/?. These conclusions can all be sum- 
marized: For any (A, B) and (x, y) the inequality 


(8) |Ax + By| < (JA47? + [B35 dx + aa 


Chapter5 | Differential Calculus 


*The function |x|P has been defined 
so far only for rational values of p. 
If |x|? is defined to be ePloglx| for 

x # Oand0O for x = O then it ts 
differentiable for p > 1 and has 
derivative px|x|P-2. 


174 


holds. Equality holds only when there are numbers u1, 
uo not both zero such that 


i(AlA'!?, B|B\"!*) = ua(x|x|?, yly|?). 


The inequality (8) and the Schwarz inequality are both 
special cases of the Holder inequality, which will now be 
derived. 


Example 


Let p be any* number greater than 1. Find the maximum 
values of Axı + Aoxeo + `° + Anxn subject to the 
constraint |x;|? + |x|? + +° + |x,|? = K where A), 
A»g,...,A,, K are fixed numbers with K > 0. The 
method of Lagrange multipliers applies and gives the 
equations 

A, = Ap' xit |x PT? (i= 1,2,...,n) 

|x|? + s.. $ [Xn]? = K 


for the critical points. Then 
A,X, +- AyXn = \pK 
is the value at the critical point and 


|A |227 He [An]? 27? — IAp|? 27! .K 
(|4;|227! + |An|? PT HKP! — |ApK|?/?—?, 


Hence the value at the critical point is 


+(|4 |227! + |A [PPT DPT R] x|? pee [xn]. 


If the A’s are not all zero then there are only two possible 
values of A, each of which corresponds to a single critical 
point. Thus there are just two critical points, one the 
maximum and one the minimum. Set g = p/(p — 1). 
Then the critical points satisfy 


|i] = |p| [x27 
|A? = [Apl |x] (because (p — 1g — 1) = 1) 
Aij AT? = pe xijil? 


where u = Ap: |p|}. These conclusions are sum- 
marized by: 


5.4 | Application: Lagrange Multipliers 175 


*So called because it is a form 

(= homogeneous polynomial = 
polynomial whose terms all have the 
same degree) of degree two 
(quadrat = square = second 
power). Not to be confused with 
‘k-forms’ in any way. 


The Holder Inequality 


Let (X1, X9,..-,Xn) and (Ay, Ao,...,An) be given 
n-tuples and let p > 1, q > 1 be two numbers such that 
(p — Ig — 1) = 1. Then 


(9) [Aixi + Agxg + +++ + Anxn| < |Alļalxlp 
where 


[Ala = (|4| +’ + |49" 
Ixlp = [xa]? + lx]. 


Equality holds in (9) if and only if there exist numbers 
H1, uo not both zero such that w,A|A|2~* = wox|x|?—?, 
1.€. 


pit Ag: |Agl27? = pas xi (xil?! 
| (i= 1,2,...,7). 


Example 


Find the minimum and maximum values of the quadratic 
form* Q(x, y) = 3x? — 2xy + 4y? on the disk 
x? + y? < 1. Ifa maximum or a minimum occurs in the 
interior of the disk then it must be at a point where 
dO = 0, i.e. 6x dx — 2x dy — 2y dx + 8y dy = 0, Le. 


6x — 2y = 0 
—2x + 8y = 0. 


This occurs only at the point (x, y) = (0, 0). If a maxi- 
mum or a minimum occurs on the boundary x? + y? = 1 
it must be at a point where dQ = d(x? + y”) for some 
A, that is 

6x — 2y = 2dx 


10 
(10) —2x + 8y = 2dy. 
Rewriting these equations as 


B- x-y=0 


(1) -x+ (4— y =0 


and noting that (x, y) = (0,0) it follows that the map 


Chapter5 | Differential Calculus 


Minor axis 


Major axis 


176 


is not one-to-one, hence that the rank is not two and the 
matrix of coefficients 


3—-r —!l 
—] 4-4 
has determinant zero; hence (3 — A)(4 — A) — 1 = 0, 
Aa? — 714+ 11 = 0, 


74 V49 — 44 
== 


r =F (7+ V5). 


For each of these two values of à the linear equations 
(11) are satisfied by all points (x,y) on the line 
(3 — \)x — y=0 (which is identical to the line 
—x + (4 — X) = 0). Hence there are two critical points 
for each of these two values of à, namely the two points 
on the circle x? + y? = 1 which lie on the line (11), 
which gives four critical points in all. Multiplying the 
first equation of (10) by x/2, the second by y/2 and 
adding gives Q(x, y) = A(x? + y”) = 2. Hence A is the 
value of Q at the corresponding critical points. Therefore 
4(7 + v5) is the greatest value of Q on the disk and 
4(7 — +/5) the least value on the circle, but 0 the least 
value on the disk. This solves the given problem. 


The distribution of values of Q can be seen quite 
clearly by sketching the level curves Q = const. The 
curve Q = | is defined by a second-degree equation, and 
hence must be one of the conic sections. It never crosses 
the circle (because 3(7 — +/5) is the least value on the 
circle) but touches points inside the circle (e.g. (41/3, 
0), (0, +4)) which means that it can only be an ellipse. 
Since Q(ax, ay) = a7Q(x, y), the other level curves of 
Q are concentric ellipses with center at (0, 0). The largest 
of these ellipses which touches the circle x? + y° = 1 is 
Q = 4(7 + v5), which touches it at the points where 
the line (3 — 4(7 + /5))x — y = 0 intersects the 
circle. This line y = —4(1 + /5)x is therefore the 
minor axis of the concentric ellipses. The major axis 
must be the line through (0, 0) perpendicular to this one, 
which is the line x = 3(1 + \/5)y. This agrees with the 
observation that it must also be the line 


(3 — 47 — V5))x — y = 0. 


5.4 | Application: Lagrange Multipliers 177 


Ay 


Example 


Find the minimum and maximum values of the quadratic 
form Q(x, y) = Ax? + By? + 2cxy on the disk x? + 
y? < 1. As before, a critical point in the interior must 
satisfy 


(12) Ax + cy = 0, 
cx + By = 0. 

Thus (0, 0) is a critical point, and there are no other 

critical points in the interior unless AB — c? = 0. In this 

case either Q = 0 or there is a line through the origin 

where (12) is satisfied. Critical points on the boundary 

must satisfy the equations 


Ax + cy = dx 
(13) cx + By = dy 
x? +y’ = 1 


which imply that the determinant of 


AÁA— c 
c B— 
is zero; that is, \7 — (A + B)\ + (AB — c°) = 0, 
which means that \ must be one of the values 


_A+B+VA= BP +42. AT B- VA = B} + 4c? 
a, a 


2 = maaa La MMIII t 


2 


These values are real numbers and they are distinct unless 
A = Bandc = 0.If \; Æ Xo, then each corresponds, as 
before, to two solutions of (13), giving four critical points 
on the boundary. If N; = àa then all points of x? + 
y? = 1 are critical points. Multiplying the first equation 
of (13) by x, the second by y, and adding, gives 
Q(x, y) = (x? + y®) = A; hence 4, Ag are the values 
of Q at the critical points. The possibilities can now be 
enumerated as follows: 


AB — c? > 0. In this case iàs = AB— c? > 0, 
hence ^1, Ag have the same sign. Since Ay < A, the 
possibilities are that 0 < Ag < Ay, that A» < Ay < O, 
that 0 < A» = dq, or that Xa = A, < O. In the first case 
there is a maximum value ), at the points where the line 
(A — dy)x + cy = 0 intersects the circle x? + y? = 1 
and a minimum value 0 at (0, 0). In the second case there 


Chapter5 | Differential Calculus 


178 


is a minimum value )g at the points where (A — do)x + 
cy = 0 intersects the circle and a maximum value 
0 at (0, 0). In the third case there is a maximum value 
ài = ào = A = B assumed at all points x? + y? = 1 
and a minimum value 0 at (0,0). In the last case the 
minimum is assumed at all points x? + y? = 1 and the 
maximum 0 is assumed at (0,0). In all four cases the 
curves Q = const. are concentric ellipses (possibly 
circular) and for this reason Q is said to be elliptic when 
AB —c*>0. 

AB — c? <0. In this case iàs = AB — c? < 0, 
hence ^i, Ag have opposite signs, i.e. X1 > 0, Ag < O. 
In this case the maximum value occurs at the points 
where (A — dy)x + cy = 0 intersects the circle x? + 
y? = 1 and the minimum value where (A — do)x + 
cy = O intersects it. The critical point at (0, 0) is neither 
a maximum nor a minimum. The curves Q = const. are 
hyperbolas whose axes are the lines (A — ^x + cy 
(i = 1, 2). Q is said to be hyperbolic if AB — c° < 0. 

AB — c? = 0. In this case \, or Ag is zero; hence 
either 0 = Ao < Ay Or Xo < Ay = O or Ay = Ag = OV. 
In the first case ~; = A + Bis the maximum value and 
it is assumed at two points on the boundary while the 
minimum 0 is assumed all along the line Ax + cy = 0 
(which is the same as cx + By = 0). In the second case 
the minimum is assumed at two points on the boundary 
and the maximum on Ax + cy = 0. The third case can 
occur only when Q = 0 and will be ignored. The level 
curves Q = const. are parallel straight lines, as 1s seen 


by writing Q(x, y) = Z (Ax + cy)? = = (ex + By)’. 


If AB — c? = Othen Q is said to be parabolic because its 
graph is parabolic. 


Example 


Show that the axes of the conic sections Q = const. 
(where Q(x, y) is a quadratic form in two variables) are 
perpendicular to each other. From the fact that Q is 
symmetric about each of its axes this is clear geometrically 
and the problem is to prove it analytically. Let (x1, yı) be 
a critical point of QO and let ~; = Q(x1, y1), 1e. 


Ax, + cy, = 1X1 
(1 3’) CX1 + By, \iy1 
xi + yi =. 


5.4 | Application: Lagrange Multipliers 179 


Let (x2, y2) be one of the two points on the circle 
x? + y? = 1 which lie at right angles to (x1, y1), i.e. 
x3 + y2 = 1 
X1X9 + Viy2 0. 


(xıx + yıy on the circle goes from a maximum value of 
1 at (x1, yi) to —1 at (— xı, — yı) and is O at the two 
points halfway in-between.) Since 


Gr 2) = 4) 
X2 Y2/ \Yı ye 0 1 
it follows that the equations 


= UX, + 0X2 
uy, + V2 


y 


can be solved for (u, v) as functions of (x, y), namely 


uU = X1Xx + Viy 
VD = XoX + oy. 


Therefore any (x, y) can be written in this form and, in 
particular, 


AX + Cg = Ux, + 0X2 
cx + Byg = uyı + Vy2 


for some pair of numbers (u, v). In fact u is given by 


u = xi(Ax2 + cy2) + yi(cxe + By2) = (Axı + cyı)x2 + (cxı + Byı)y2 
= hiX1Xe + Aye = Q. 


Since u = 0 the equations (13) are satisfied by (x2, ya, V); 
hence (xə, y2) is a critical point as was to be shown. 


Example 


Find the minimum and maximum values of the quadratic 
form Q(x, y, Z) = Ax? + By? + Cz? + 2ayz + 2bxz + 
2cxy on the ball x? + y? + z? < 1. As in the case of 
two variables, the critical points in the interior are the 
solutions of 


Ax + cy + bz = 0 
(14) cx + By + az 
bx + ay + Cz = 


| l 
© © 


Chapter5 | Differential Calculus 


180 


and the critical points on the boundary are the solutions 
of 


Ax + cy + bz = dx 
cx + By + az 
bx + ay + Cz = dz 
x? + y?+ 27 = 1, 


I 
> 
= 


(15) 


(The 1-form d(x? + y? + z”) is never zero on the sphere 
x? + y? + z? = 1.) These equations imply as before 
that 


(16) c B—w a = 0 
b a C— À 


which gives a polynomial of degree 3 which must be 
satisfied by à in (15). As before, \ = Q(x, y, Z) at any 
point which satisfies (15). Thus if Q assumes a maximum 
or a minimum value on x” + y? + z? < 1, and it is 
intuitively plausible that it must, then this value must be 
either 0 or a root of the polynomial (16). The solution of 
given problem is therefore reduced to finding the largest 
and smallest roots of the cubic polynomial (16) and 
comparing them to zero. 


Example 


Given Q(x, y, Z) aS above, show that there exist three 
mutually perpendicular lines through the origin whose six 
points of intersection with x? + y? + z? = 1 are 
critical points of Q on x? + y? + z? < 1. Taking it for 
granted that Q must assume maximum and minimum 
values on x? + y? + z? = 1, it follows that the equations 
(15) must have at least two solutions. Let (x1, y1, Z1) be 
a solution. Consider now the problem of finding the 
maximum and minimum values of Q on the circle 


x? +y? +z =l 
xix + yıy + zız= 0 


which is the ‘equator’ when the ‘poles’ are +(x1, Y1, Z1). 
The method of Lagrange multipliers applies (see Exercise 
13) and implies that a maximum or a minimum of Q on 
the circle can occur only at a point (x, y, z) for which 


5.4 | Application: Lagrange Multipliers 


181 


there are numbers ^1, Xo such that 


Ax + cy + bz = ^x + AQX1/2 
cx + By + az = My + dayi/2 
bx + ay + Cz = ız + doZ1/2 
x? +y? +z = 1 

xıx + yıy + 212 = 0. 


Since the function Q must have a maximum and a mini- 
mum on the circle these equations must have at least two 
solutions. Let (x2, Y2, Z2, 41, Ag) be a solution. Multiply- 
ing the first equation by xı, the second by yı, the third 
by zı, and adding gives \_9/2 = xı(4x2 + cyo + bZ2) + 
yi(cxg + Byg + azə) + 21(bX2 + ayo + Cz) = 
(Axı + cyy + bzı)x2 + (cx, + Byy + azı)}y2 + 
(bxy + ayy + CzZ1)Z2 = AX1X2Q + Ayıy2 + AZ1Z2 = 
Mx1X2 + yiye + 21Z2) = 0 where A = Q(X), Yı, Z1) 
Therefore Xs = 0 and (Xo, Y2, Z2, 41) is a solution of 
(15); that is, (x2, Y2, Z2) is a critical point of the original 
problem. If (x3, y3, Z3) is one of the two solutions of 


Il 
© 


Xıx + yyy + 212 
X2x + yoy + Zoz = 0 
x? + y? + 2? 


| 
pet 


then 


Xı Vi 41 Xı X2 X3 1 0 0 
(17) {x2 yo Zofly1 Yo ya }P=HlO 1 Of: 
X3 Y3 Z3 Zı Z2 Zg 0 0 1 


Hence there exist numbers u, v, w such that 


Ax3 + cyz + bzg = ux, + 0X2 + Wx3 
cX3 + By3 + aZ3 = uy, + Vy2 + WY3 
bx3 + ay3 + Cz3 = UuZı + VZ2 + WZ3. 


Multiplying by x1, y1, Z1, respectively, and adding gives 

= 0 as before. Similarly v = 0 so (x3, y3, Z3, w) 1S a 
solution of (15); that is, (x3, Y3, Z3) is a critical point of 
the original problem. The lines through the three points 
(xi, Yi, Zi) (i = 1, 2, 3) therefore have the desired prop- 
erties. 


Chapter5 | Differential Calculus 182 


Exampie 


Find all critical points of Q(x, y, z)on x? + y? + z? < 1. 
Let (x;, Yi, Zz) (i = 1, 2,3) be mutually perpendicular 
critical points of Q as above, and let \; = Q(xi, Yi, Zi). 
Then Q has at least the seven critical points (0, 0, 0), 
+(x;, Yi, Zi) (i = 1, 2,3), at which the values of Q are 
0, Ài, Na, Ag. 

By (17) the equations 


X = UX, + UX + Wx3 
(18) y = uyı + Vy2 + Wy3 
Z = UuZ, + VZo + WZ 


have the unique solution 


u = Xyx+ yıy + 212 
X2X + poy + Zaz 
W = X3X + Yay + Z3Z. 


(19) 


Now if (x, y, Z) iS a critical point of Q(x, y, z), that is, if 


Ax + cy + bz = dx 
cx + By + az = Ny 
bx + ay + Cz =z 


for some \, then the substitution (18) in these equations 
gives 
UN 4X4 + Vinx + WA 3X3 = UX, + AUX 9 + AWX38 
UNI yi + Uy + Ways = Auy1 + M2 + AWYs 
UN1Z1 + VAgZo + WA3Z3 = AUZ + AVZg + AWZ3 


using the fact that (x;, Yi, Zi, \;) satisfies (15). By the 
uniqueness of the solution of (18) this gives 


Ui = unr 
Udo = UA 
WÀ 3 = WÀ. 


If the numbers 0, X1, Ae, Ag are distinct, this implies that 
at least two of the numbers u, v, w are zero. Thus in this 
case the critical point (x, y, z) must be a multiple of 
(Xi, Yi, Zi) for i = 1, 2, or 3. Since the multiple must be 
0 or +1, this shows that if 0, X1, X2, Ag are distinct then 
there are exactly seven critical points. If two or more of 
these numbers 0, 1, A2, Ag, coincide, then the set of 
critical points also includes a line segment through the 


5.4 | Application: Lagrange Multipliers 183 


Exercises 


interior (A; = 0), a great circle on the sphere QA; = Aj), 
a disk through the interior (0 = M; = A,), the entire 
sphere (A, = Ag = Xz), Or the entire ball (Q = 0). 


Example 


Determine whether a quadratic form Q(x, y, z) has a 
unique minimum value at the origin (0, 0,0); i.e. de- 
termine whether Q(x, y, z) > 0 for (x, y, z) ¥ (0, 0, 0), 
without solving the cubic polynomial (16). (If 
Q(x, y, z) > 0 for (x, y, z) ¥ (0, 0, 0) then Q is said to 
be positive definite. The importance of this notion in 
physics is that it is the necessary and sufficient condition 
for (0,0,0) to be a point of stable equilibrium.) This 
problem is solved in the exercises. The solution is that 
Q(x, y, Z) is positive definite if and only if 


Ac b Ac 
c Bai>dQ, Bic O and A > 0. 
b a C 


(The obvious generalization to quadratic forms in n 
variables is also true. This is particularly important for 
n > 3 because the generalization to n variables of the 
cubic (16) is a polynomial of degree n, and it can be very 
difficult to determine whether or not such a polynomial 
has any negative roots.) 


1 Find the point of the line 


13 
36 


3x — yo 2z 
x + 4y — 7z 


I 


which lies nearest the origin. What is the general procedure 
for finding the point of the line 


Aix + Bıy + Cız = Dı 
Aox + Boy + Coz = De 


nearest the origin? Under what circumstances would the 
method of Lagrange multipliers not apply to the constraints 


(*)? 


2 Snell’s law. In the diagram on the following page let / 
represent water level, let ÆA be a point above water and B be a 
point below the water level. Show that if vı is the velocity of 
light in air and vg the velocity of light in water, and if the time 


(*) 


Chapter5 | Differential Calculus 


b, 


a2 


bz 


B 


184 


required for light to travel from the point A to the point B 
along the path indicated is minimal then 


bi _ bo 
v1 V ai + 57 veV az + b 


In other words, 
sin Q1 U1 


sinag Ug 
3 A rectangular box with an open top is to have total surface 


area 24 (four sides and the bottom) and to have the largest 
possible volume. Find the dimensions of the box. 


4 Find the maximum and minimum values of xyz on the 
sphere x? + y? + z? = 1 and all points at which they are 
assumed. 


5 Let two attracting particles be constrained to lie on curves, 
the first on a curve f(x, y) = 0 and the other on the circle 
u2 + v* = 1. Assuming that f(x, y) = 0 lies entirely outside 
the circle, show that the equilibrium positions of the particle 
on f(x, y) = 0 are the same as they would be if the other 
particle were fixed at (0, 0), and that the corresponding equi- 
librium position of the particle on the circle is the point of the 
circle which lies on the line segment from (0, 0) to the particle 
on the curve. 


6 (a) Find the maximum value of xy on the line x + y = 1. 
(b) Does xyz have a maximum value onx + y+ z= 1? 
(c) Find all critical points of xyz onx + y +z= 1. 
(d) Let A, B, C be given positive numbers. If x4y8z° 

assumes a maximum value on the set where x > 0, 
y > 0, z > 0, Ax + By + Cz = 1, what must this 
maximum value be? 

(e) Describe this set {x > 0, y > 0, z > 0, Ax + By + 
Cz = 1} geometrically and evaluate x4y?z© on its 
boundary. Assuming there is a minimum, does it 
occur on the interior or does it occur on the boundary? 

(f) Find the maximum value of x 141x2942...X%n4. on 
Axx, + °° + AnXn = K (A; > 0, x; > 0, K > 0) 
and state the result as an inequality. [First assume 
>A; = 1 and then reduce the general case to this 
case. | 

(g) Compare the geometric mean ~/x1x2...X, and the 


1 
arithmetic mean — (x1 + x2 +:°:: + xn) of a set of 
n 


n positive numbers. 
(h) When can equality hold in the inequality of (£)? 


7 (a) Sketch very roughly the curve |x|? + |y|? = 1 for 


(b) Let |(x, y)|o = lim |(x, y)lp = lim [|x| + |y|P]?”. 


p 
How else can |(x, y)|o be described? 


5.4 | Application: Lagrange Multipliers 185 


(c) For very large values of p, approximately where on the 
curve |x|? + |y|? = 1 does 3x + 10y assume its 
largest value and approximately what is that value? 

(d) Approximately where does 2x — 7y assume its largest 
value on |x|? + |y|? = 1 (p large), and approximately 
what is that value? 

(e) What is lim of the largest value of Ax + By on 


pa 

|x|? + |y|? = 1 and near what point or points does 
Ax + By have this value? Draw a diagram. Consider 
in particular the case A = 0, B # 0. 


8 (a) Find the maximum and minimum values of a linear 

form Axı + Aəxə + °- + Anx, on the ‘cube’ 
lxil <1@=1,2,...,n). 

(b) At what points is the maximum assumed ? 

(c) Comparing to Exercise 7, state the Holder inequality 
in the case p = 1,q = œ. 

(d) What is the maximum value of A1x1 + Aoxe + 
“+++ Anx, on the ‘octahedron’ |x1| + [xe] + 
ss + [xn = 1? 


9 (a) Prove that for any two points (x1, x2) and (y1, y2) in 
the plane the inequality 


[Ger + y1)* + (xo + y T < bt + al” + Dt + yal” 
holds. 
(b) When does equality hold? 
(c) Show that for any two n-tuples x = (x1, X2,...5 Xn); 


Y = (Y1, y2,...,¥n) the inequality |x + y|, = 
Ix] + |y|p holds; that is | 


È |x: + el < È ee] + È bP | 


This is known as Minkowski’s Inequality. 
(d) When does equality hold? 


i/p 


10 Is the quadratic form 
Q(x, y) = 5x? + 6xy + 2y? 


elliptic, hyperbolic, or parabolic? Sketch the curves Q = 
const. showing axes and maximum and minimum values of Q 
on the circle x2 + y? = 1. 


11 Sketch the curves Q = const. for 
Q(x, y) = 3x? + 4xy + y? 
as in Exercise 10. 


12 If Q(x, y, z) is a quadratic form in three variables, what 
are the possible configurations of the surfaces Q = const.? 
Give examples and verbal descriptions. [The terms ‘ellipsoid’, 


Chapter5 | Differential Calculus 


186 


‘hyperboloid of one sheet’ and ‘hyperboloid of two sheets’ 
will be useful.] 


13 


Show that the constraints 


1 
0 


x? + y? + 2? 
xıx + yiy + 212 


| 


(x1, ¥1,Z1 = numbers not all zero) are non-singular of 
rank 2. 


14 


Let Q(x, y, z) be a quadratic form in three variables and 


let (x;, Yi, Zi), (i = 1, 2,3) be mutually orthogonal critical 
points of Q. The numbers (u, v, w) in (18) can be regarded as 
new coordinates on R°. 


15 


(a) Find the equation of the sphere x? + y? + z? = 1 
in uvw-coordinates. 

(b) Find the expression of Q(x, y, z) in uvw-coordinates. 

(c) Find all critical points of Q on the ball u? + v? + 
w2 < 1 in wvw-coordinates. 


Prove the test for positive definiteness stated in the text. 


(a) Let Q be a quadratic form in n variables (n = 
1, 2, 3). Prove first that the polynomial analogous to 
(16) has a root < 0 if its constant term is < 0. 
Conclude that if Q is positive definite, then all the 
determinants in the test must be positive. 

(b) The proof of the converse is by induction. Since 
A > 0 is necessary and sufficient for 4x? to be 
positive definite, the test works for n = 1. Forn = 2 
the formula AB — c? = d4\2 shows that if the de- 
terminant AB — c° is positive then either \1, A2 are 
both positive or both are negative. If both are nega- 
tive, then the maximum value on x? + y? = 1 is 
negative, hence A is negative and the desired conclu- 
sion follows. For n = 3, if the numbers Aj, A2, A3 are 
distinct then the polynomial (A; — A)(A2 — A)(3 — A) 
has the same roots and the same leading coefficient 
as the polynomial (16); hence they must be identical 
and, equating their constant terms. 


Ac b 
AyA2A3 = c Ba\: 
b a C 


This argument is no longer valid if two of the ^; 
coincide, but the formula still holds and can be 
proved by proving 


X1 X2 X3 xı xe x3\/A1 O0 0 


Ac b 
c B aj|\yi yo y]=\y yo yz]{ 90 Ae 0 
b a C 


Z1 Z2 Z3 Z1 22 23 0 0 A3 


5.4 | Application: Lagrange Multipliers 187 


Thus if the determinant is positive then either Q is 
positive at all critical points on the sphere (hence has 
a positive minimum) or two of the numbers à 1, A2, A3 
are negative. But if two of them are negative—say 
\1, Ag—then Q is easily shown to be negative on the 
entire circle 


= UX1 + VXxX2 

uy1 + vy2 
Z = UuZ1 + UZ2 

x? + y? + 2? = 1 


and hence to be negative somewhere on the plane 
z = 0; that is, Q(x, y,0) is not positive definite. 
Therefore if Q(x, y, 0) is positive definite and if the 
determinant is positive then Q(x, y, Zz) is positive 
definite. 


16 Give a complete proof ab initio that if O(x1, x2,... 5 Xn) 
is a quadratic form in n variables, say 


Q(xı, ...3 Xn) = > AijXiXj 


i,j=1 


where A; = Aji, if O(x1,..., Xn—1, 0) is a positive definite 
quadratic form in n — 1 variables, and if det(A;;) > 0, then 
Q is positive definite. 


17 A quadratic form in n variables is said to be positive 
semi-definite if it assumes no negative values. State and prove 
a necessary and sufficient condition for a quadratic form to 
be positive semi-definite. 


18 Second derivative test. Let F(x, y, z) be a twice differenti- 
able function—that is, a function whose first partial derivatives 
have (continuous) first partial derivatives—and let (X, y, Z) be 
a critical point of F. Let 


a= 12T ay, B= 12E 7,2) c= E aya) 
ami E aa i E ene 
sl Fay yal EE aya) 
-1 EE aya = l EE aya 


(using the equality of the mixed partials—see Exercise 6, 


Chapter5 | Differential Calculus 188 


§3.2), and let Q(x, y, z) be the corresponding quadratic form. 
(a) Show that 


F(X + su, F + sv, Z + sw) = F(R, 7, 2) + s" Olu, v, w) + 0(s”) 


that is. show that given a tolerance « > 0 and a 
bound B > 0 there is an S > 0 such that 


|F(% + su, 7 + sv, 2 + sw) — F(X, 9,2) 
| s2 


whenever |u| < B, |o] < B, [w| <B, 0<s<8S. 
[Estimate the first partials of F with an error less 
than es, then the value of F with an error less than 
es?.] 

(b) Conclude that if Q is positive definite then there is a 
ball œ- +0- + @ — 2)? < 6? on 
which F(x, y, Z) is the unique minimum value of F. 

(c) Conclude that if there is a ball (x — X)? + 
(y — y)? + (z — 2)? < 6? on which F(X,Y, Z) is 
the unique minimum value of F then Q is positive 
semi-definite. 

(d) Suppose the numbers 


T Q(u, D, w) <€ 


Ac b Ac 
A3 = |c B aj, A =|, pl» 41=A4 
b a C 


are known. For which values (^3, Ag, Ai) can one 
conclude that there is a ball (x — x)? + (y — 7)? + 
(z — z)* < ô? on which F(x,y,Z) is a unique 
minimum of F? For which values (A3, Az, Ai) can 
one conclude that this is not the case? For which 
values (A3, Ag, A1) can neither conclusion be drawn? 


19 Isoperimetric inequality. Given n points Pı = (xı, yı), 
Pa = (x2, y2),...5Pn = (Xn, Yn) in the xy-plane, set Po = 
P,, and imagine P;P2...P, as describing a closed, oriented, 
P, n-sided polygonal curve in the plane. The length of this curve 
P5 1S 


L= 2 |PiPi—a| = 2 V (xi — Xi—1)? + Oi — yi-1)? 
i=l i=l 
ZL. P, and the oriented area it encloses is 


A 


>, [oriented area of OP;_1P)] 
i=l 


n 
2 
t=1 


Show that if a given n-gon has a maximum (or minimum) 
value of A among all polygons with the same L then it must 


Pe Py =P; Xi-1 Yi-1 


Xi Yi 


| 
bole 


5.4 | Application: Lagrange Multipliers 189 


be a regular n-gon. Hence, assuming that the problem 
A = max., L = const. has a solution, the value of A for any 
n-gon is at most the area of the regular n-gon with the same L, 


that is, 
Ash (cot?) 
4n n 


As n —> œ this gives 
1 .9 
A<—L 
— 4r 

as an upper bound on the area A of a polygon (of any number 
of sides) whose perimeter is L. This is called the isoperimetric 

. , 1 
inequality. [If n = 2 the inequality A < z; (co z) L? holds 

n n 

trivially. Suppose it has been proved for all values less than a 
given value of n, and suppose a polygon (x1, Y1, X2, y2,..., 
Xn, Yn) is given such that A = max. for L = const. It is to be 
shown that this must be a regular n-gon. First use the inductive 


. 1 . 
hypothesis and the fact that An (cot z) L? increases as n 
n n 


increases to show that the given n-gon (assumed to be a 
maximum) cannot have the property that two consecutive 
vertices (X;i—1, Yi—1), (Xi, y:) coincide. Conclude that L is 
differentiable at the point (x1, Y1, X2, Y2, .. <, Xn, Yn) and 
that the method of Lagrange multipliers applies to the prob- 
lem A = max., L = const. Express the resulting equations 
dA = \ dL in terms of u; = x; — Xi—1, Ui = Yi — Yi—1, and 
l; = Vu? +v. Derive two different expressions for 
(uipi + u:)(uipi — ui) + (vipi + v)(Vi+1 — vi) and con- 
clude that /; = /;11, 1.e. all sides of the given polygon must 
have the same length / = L/n. Then simplify and solve for 
U;41, Vi4-1 aS functions of u;, v;, A (and n, L) of the form 


Uiii _ Ui 

(a) MO) (e) 
where M(N) is a 2 X 2 matrix depending on A. Use the 
formulas for M(\) to conclude that 


_ fcos@ —siné 
MQ) = (co 6 cos s) 


for some number @ in the interval {0 < 0 < 2r}. From 


morfe) =) 
Dy V1 
conclude that 0 = j-27/n for some integer j} 0 <j <n. 


Given 0, which must be one of these n — 1 values, the first 
side (xo, Yo), (x1, y1) of the polygon determines the rest of the 


Chapter5 | Differential Calculus 


Summary. 
Differentiable Manifolds 


190 


polygon. Find A as a function of j and of 
L = nV (xı — x0)2 + (yı — yo)?. 


Conclude that j = 1 and that therefore the polygon is a 
regular n-gon.] 


5.5 


A function f: R” — R” is said to be differentiable* if it 
has the property that its mn first partial derivatives exist 
and are continuous. That is, a function 


(1) Yi = filX1, X2,- ++ Xn) (i= 1,2,...,m) 


is differentiable if 


. fixi, eee s Xj—1s Xj + h, Xj+ls. >’ Xn) = fix, XQ, +005 Xn) 
in —-2—————. oo; 
h-0 h 


*The concept defined here is also 


called ‘continuously differentiable’ 


or ‘C!-differentiable’. In this book 
the word ‘differentiable’ is used only 
in this sense unless otherwise stated. 


exists and depends continuously on (x1, X2,..., Xn) for 
all i, j. When this is the case, these limits (which are 


OV; 
functions of x1, X2,..., Xn) are denoted = . The pull- 
x; 


back of the 1-form dy; on the range R” is defined to be 
the 1-form 


= i ðY: p Ii 
(2) Pis x oat x 2 t + x Xn 


n 


on the domain R”. The pullback of a (variable) k-form 


(3) A(V i, V2, ++ +s Ym) ayy dyo... dyr +`: 


on the range R” under the differentiable map (1) is 
defined to be the k-form in x1, X2,..., Xn obtained by 
performing the substitutions (1) and (2) in (3) and using 
the usual rules (dx; dx; = —dx; dxi, dx; dx; = 0) for 
computing sums and products of 1-forms in x1, X2,..., 
Xn. The result is a (variable) k-form on the domain R” of 
the map (1). 

The principal theorem of differential calculus is the 
Chain Rule, which states that the composition R” — 
R” — R? of two differentiable maps is a differentiable 
map and that the pullback of a k-form under the com- 
posed map is equal to the pullback of the pullback. 

In solving a system of equations (1) for (x1, X2,.--.5 Xn) 


5.5 | Summary. Differentiable Manifolds 191 


given (V1, Y2s-»--, Ym) by a process of step-by-step 
elimination one uses the following theorem, which is 
proved in §7.1. 


Elimination Theorem 


Given an equation of the form 


(4) y = f(X1, Xa, -- +5 Xn) 


in which fis a differentiable function and given a point 


—a 


Y = f(%1, X2,..-5 Xn) 


., 0 . 
at which = ~ 0, there exist a number e > 0 and a 


Xi 
differentiable function g(y, Xə, X3,..., Xn) such that the 
equation 
(5) X1 = ay, X25 X3s -s Xn) 


is equivalent to (4) at all points within e of (F, X1, X2, 
...,Xn), that is, f and g are both defined at all points 
(Y, X1, X2,..+,Xn) where |y — F| < e, |x; — Xıl < €, 
-a |Xn — X,| < € and such a point satisfies (4) if and 
only if it satisfies (5). 


Using the Elimination Theorem and the Chain Rule, 
one can prove the Implicit Function Theorem: A system 
of equations (1) can be reduced to the form 
(2a) Xi = gii Y2,- ->s Yrs Xr41s- -s Xn) 

(i= 1,2,...,r) 


(2b) Yi = A(V1, Ya, - . -> Yr) 
(i=r+1,...,k) 


locally near a given point (X1, X2, . . . , Xn) by differenti- 
able functions g;, h; if and only if (1) the coefficient of 
dx, dxə ... dx, in the pullback of dy; dyə . . . dy, under 
(1) is not zero at (X1, Xo,..., Xn) and (ii) the pullback 
of any k-form under (1) is identically zero near (X1, Xo, 
..+ 5X) When k >r. 

The map (1) is said to be non-singular of rank r at the 
point (X1, Xe,...,Xn) if (i) and (ii) of the Implicit 
Function Theorem are satisfied by some rearrangement 
of the x’s and y’s. This is true if and only if the pullback 
of some r-form is not zero at (X1, Xo,..., Xn), but the 


Chapter5 | Differential Calculus 


192 


pullback of every (r + 1)-form is identically zero near 
(X1, X2,..., Xn). If this is not the case for any r then 
(X1, X2,..., Xn) is said to be a singularity of the map (1). 

Just as the geometrical meaning of the Implicit Func- 
tion Theorem for affine maps was clarified by the concept 
of an ‘affine manifold’, the geometrical meaning of the 
Implicit Function Theorem for differentiable maps is 
clarified by the concept of a ‘differentiable manifold’: A 
k-dimensional differentiable manifold in R” defined by 
local parameters is a subset M of R” with the property 
that for each point P of M there is a number e > Oanda 
differentiable map f: R — R” which is non-singular of 
rank k at all points where it is defined and which has the 
property that a point of R” which lies within e of P is in 
M if and only if it is in the image of f. In short, M is 
parameterized near P by k independent real variables. A 
k-dimensional differentiable manifold in R” defined by 
local equations is a subset M of R” with the property that 
for each point P of M there is an e > 0 and a differenti- 
able map f: R"-—>R”* which is defined and non- 
singular of rank n — k at all points Q within e of P, and 
which has the property that such a point Q isin M if and 
only if f(Q) = f(P). In short, M is defined near P by 
n — k independent equations. 

The Implicit Function Theorem shows that every 
manifold defined by local parameters can be defined by 
local equations and vice versa (Exercise 1); hence the 
concept of a k-dimensional differentiable manifold in R” 
is well defined. 

In terms of this concept, the equations (2b) of the 
Implicit Function Theorem can be interpreted as stating 
that the image of a non-singular differentiable map is 
locally a differentiable manifold. The equations (2b) 
define the image both by parameters (1, Yo,..-5)r) 
and by equations (y; — /,(y1, Y2,- --, Yr) = 0). The 
dimension of the image is equal to the rank of the map. 
Similarly, the equations (2a) state that the Jevel surfaces 
of a non-singular differentiable map are differentiable 
manifolds. The equations (2a) define the level surfaces 
both by parameters (X;11, Xr42,.-- 5 Xn) and by equa- 
tions (x; — gi(V1, V2, -< -s Yrs Xr41s +++ Xn) = 0 where 
(Vis Y2,- --, Yr) iS fixed). The dimension of the level 
surfaces is n — r where r is the rank of the map. 

The method of Lagrange Multipliers can be stated in 
terms of the concept of ‘differentiable manifold’ as 
follows: Let M be a differentiable manifold in R”, let 


5.5 | Summary. Differentiable Manifolds 193 


Exercises 


y = f(X1, Xg,..-., Xn) be a differentiable function de- 
fined at all points of M, and consider the problem 


PinM 
y = max. at P. 


If P is a solution of this problem and if X1, Xo,..., Xk 
are differentiable functions such that M near P is defined 
by the equations 


(6) X; = const. (i= 1,2,...,k) 


(hence M is (n — k)-dimensional) then there exist 
numbers ^1, Ao, ..-, Ag Such that 


(7) dy = Mı dXı + N2 dXo +00 +h, dX, 


at P. The equations (6) and (7) give n + k equations in 
the n+ k unknowns (xi, X2,..., Xn A1, A23. -3 Ap) 
where (x1, X2, ..., Xn) are the coordinates of P. These 
equations can be solved explicitly to find the coordinates 
of P in certain simple cases, and can be used to deduce 
useful statements about the possible solutions P in many 
others. The equation (7) was deduced from the Implicit 
Function Theorem in §5.4. 


1 Prove that a k-dimensional differentiable manifold defined 
by local parameters can be defined by local equations and vice 
versa. 


2 Show that the equation x? + x? +- +x? = K 
defines a differentiable manifold in R” whose dimension is 
n — 1 when K > 0 and whose dimension is 0 when K = 0. 


3 Is the folium of Descartes [Exercise 9, §5.1] a 1-dimensional 
differentiable manifold in the xy-plane? 


4 Prove that the set of all invertible 2 X 2 matrices 


X1 X2 
X3 X4 
is a differentiable manifold in R4. What is its dimension? 


5 A 2 X 2 matrix is said to be an orthogonal matrix if it 


satisfies 
X1 X2 X1 X3 — 1 0 , 
X3 X4/ \X2 X4 0 1 


Chapter5 | Differential Calculus 


194 


that is, if its transpose is equal to its inverse. Prove that the 
set of all 2 X 2 orthogonal matrices is a differentiable mani- 
fold in R*. What is its dimension? Give an explicit definition 


of this manifold by parameters near (G ') . 


6 Ann X n matrix is said to be an orthogonal matrix if its 
transpose is equal to its inverse. [For instance, the equation 
(17) of §5.4 says that a certain 3 X 3 matrix is an orthogonal 
matrix.] Show that the set of orthogonal matrices is a dif- 
ferentiable manifold in R”” and find its dimension. 


7 Envelopes. An equation of the form f(x, y, a) = const. 
can be imagined as defining a curve in the xy-plane for each 
fixed value of a; as a varies the equation then defines a family 
of curves in the xy-plane. For example, 


(*) (x-a)?+y2 =1 


describes the family of circles in the xy-plane whose radii are 
1 and whose centers lie on the x-axis. An envelope of a family 
of curves in the xy-plane is a curve C with the property that 
for each point P of C there is a curve of the family through 
P tangent to C. [Thus, in particular, any curve of the family 
is an envelope of the family.] The standard method of finding 
envelopes C of f(x, y,a) = const. which are not themselves 
curves of the family is to eliminate a from the equations 


f(x, y, a) = const. 
falx, y,a) = 0 


where f, denotes the partial derivative of f with respect to a. 
This gives an equation of the form g(x, y) = const. which 
defines an envelope of the family. 


(a) Apply this process to find envelopes of the family (*) 
above. 

(b) Prove that this process indeed gives an envelope under 
the following conditions: If (¥, y, &) is a point where 
the equations f = const., f = 0 are satisfied, and if 
fa is a differentiable function with faa # 0 near 
(X, ¥, &) then the equation f,(x, y,a) = 0 can be 
solved locally near (x, y, &) to give a as a function of 
x and y. Substituting in the equation f(x, y,a) = 
const. gives an equation of the form g(x, y) = const. 
If the determinant of partial derivatives 


fe Ju 
fea fya 


is not zero near (xX, y, &) then the curve g(x, y) = const. 
is an envelope. 


5.5 | Summary. Differentiable Manifolds 195 


(c) The trajectory of a shell fired from (0, 0) is the curve 


X = Uyl 
Y 


(i) 


vyt — 3gt’ 


where (vz, v,) is the velocity with which the shell is 
fired and g is the acceleration of gravity. Show that 
the family of all trajectories (+) for a fixed value of 


* 
For ee WF Ooapod opes v = Vvo? + v? has an envelope which is a parabola. 
Calculus, The Macmillan Co., 1925, This is the ‘parabola of safety’ beyond which the shell 


Chapter VIII. cannot be fired.* 


integral calculus 


chapter 6 


6.1 


Summary The integral fs w was defined in Chapter 2 for w a con- 
tinuous k-form on n-space and for S a k-dimensional 
domain in n-space, parameterized on a bounded domain 
D of k-space. (In Chapter 2 only the cases n < 3 were 
considered, but the same definitions apply, with only 
minor modifications, to cases where n > 3.) The defini- 
tion had very serious defects: it was necessary to param- 
eterize a domain in order to define integrals over it, 
and it was never proved that such a parameterization 
can be given nor that the resulting number fs w is 
independent of the choice of the parameterization. This 
chapter is devoted to the rigorous definition of fs w for 
a continuous k-form w on n-space and for a suitably 
general class of k-dimensional domains S in n-space— 
namely, the class of compact, oriented, differentiable, 
k-dimensional manifolds-with-boundary. 

The subject of this chapter is merely the definition of 
f s œw, an intuitive definition of which is very easily given 
in simple cases, and the proof that this definition has all 
the properties which would have been expected on the 
basis of the intuitive definition. Such definitions and 
proofs are of great theoretical importance, but have 
little practical significance. In practice, integrals fs w are 
rarely evaluated. The integrand (the field) and the concept 
of integration (which gives the field its meaning) are the 
important ideas, and the actual number fs w is usually 


196 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhauser Classics, 
DOI 10.1007/978-0-8176-8412-9_6, © Harold M. Edwards 2014 


6.2 | k-Dimensional Volume 197 


k-Dimensional Volume 


*Of course ‘volume’ normally means 
‘three-dimensional’. Note that 
‘one-dimensional volume’ ts length 
and ‘two-dimensional volume’ is 
area. The term ‘k-dimensional 
content’ is often used instead of 
‘k-dimensional volume’. 


of no interest. When it is actually necessary to perform 
an integration, the domain of integration S (as well as 
the integrand w) must be relatively simple for there to be 
any hope of success—simple enough that the parameter- 
ization of S, and hence a definition of Ís w, would 
present no problem. 


6.2 


The intuitive meaning of the 2-form dx dy on xyz-space 
is ‘oriented area of the projection on the xy-plane’—a 
function assigning numbers to surfaces in xyz-space. 
It was on the basis of this intuitive idea that the algebraic 
rules governing 2-forms and their pullbacks under 
affine maps were derived in Chapter 1. Similarly, the 
algebra of 3-forms was based on ‘oriented volume’. In 
Chapter 4, the algebra of k-forms was defined as a natural 
extension of the algebra of 2-forms and 3-forms, and 
this algebra was found to be very useful in stating and 
proving such basic theorems as the Chain Rule, the Im- 
plicit Function Theorem, and the method of Lagrange 
multipliers in Chapter 5. However, it has not yet been 
proved that 2-forms actually do describe areas or that 
3-forms describe volumes, when areas and volumes are 
defined—as they must be—by integrals. This section 1s 
devoted to proving that the pullback operation on 
k-forms, as defined algebraically in Chapter 4, does 
indeed have a meaning in terms of ‘k-dimensional 
volume’*, as defined by an integral in the obvious way: 


Definition 


Let D be a bounded subset of x,x2...x,-space. The 
k-dimensional volume of D, denoted Í p ax; dXə . . . dX, 
is defined as follows: Let B be a number such that all 
coordinates of all points of D are less than B in absolute 
value. In other words, let B be a number such that D is 
contained in the k-dimensional cube {(x1, Xo,..., Xx): 
|x;| < B,i = 1,2,...,k}. An approximating sum }_(a) 
to f p ax, dXxə . . . dx; is formed by choosing 


(i) a subdivision of each of the k intervals {—B < 
xi < B} into small subintervals, thereby sub- 
dividing the cube {|x;| < B} into k-dimensional 
‘rectangles’ which will be denoted generically 
by Ra, and 

(ii) a point P, in each of the ‘rectangles’ Ra, 


(a) 


Chapter6 | Integral Calculus 198 


*To find fy dx, dx... Ox, to an 
accuracy of n decimal places set 

e = 70-9, choose 6 as in the Cauchy 
Criterion, choose a with |a| < 6, and 
compute >. (a). 


and by setting 


> (a= > k-volume of (Ra) 
Py is in D 
where the k-volume of R, is defined to be the product 
of the k dimensions of R,. The mesh size of the choices 
a, denoted |a], is defined to be the largest number which 
occurs as one of the dimensions of one of the rectangles 
R,. The integral fo dx, dx_...dxy, is said to converge 
if the Cauchy Convergence Criterion is satisfied: 


For every e >0 there is a 6 > 0 such that 
> (a) — E (a| < e whenever jal < ô, |a’| < ô. 


When this is the case the approximating sums determine 
a real number, called the k-dimensional volume of D 
and denoted by f p dxı dxə . . . dxy.* When this is not 
the case it is said that f p ax; dx2...dx; does not exist 
or that the k-dimensional volume of D is not defined. 


Theorem 

Let 
k 

(1) yi= Day tb; G@=1,2,...,4) 
j=l 


be an affine map R* — R*, let D be a bounded domain 
in x1Xə2... Xp-space such that f p ax, dXə . . . dX; COn- 
verges, and let f(D) denote the image of D under the 
map (1). Then ines dy, dyz... dy, converges and 


| dy; dya . . . dyk 
f(D) 


= Æ 91, Yan +++» Ve) dx, dx» wee dx}, 


0(X1, XQ, 2045 Xx) D 
where the sign is determined by the condition that both 
integrals are by definition not negative. 


This theorem gives a precise formulation of the intui- 
tive meaning of the pullback operation, namely, that the 
pullback is a composed function assigning numbers to 
k-dimensional domains D in x,Xq...xX,-Space by 
‘evaluating’ the given form dy, dy2... dy, on the image 
of D under the given map (1). The ambiguity of sign arises 
from the fact that dy, dyə . . . dyp and dx, dx... dxz 
represent oriented volumes, whereas fo dxı dXə .. . dXk 


6.2 | k-Dimensional Volume 


199 


and Sr D) dy; dy... dy, were defined without reference 
to any orientation of D or f(D). 


Proof 


If f: Rt > R* and g: Rf — R* are affine maps for 
which the theorem is true, then the theorem is obviously 


true for the affine map fo g: 
ay | 
A > dx 


| dy converges and is 
g(D) 


| dz converges and is 2 | dy 

f(g(D)) OY) J gD) 
_ |ðz oy | _ |e | 
~ «lay Ox p= Ox p” 


as desired. Therefore it suffices to show that every affine 
map R* — R* can be written as a composition of simple 
maps for which the theorem is true. Now every affine 
map R* — R* can be written (see Exercise 4) as a com- 
position of: 


(1) Translations y; = x; + const. 
(ii) Multiplication of coordinates by scale factors 
yi = const. x; (possibly negative or zero). 
(iii) Interchanges of coordinates. (Each y; is a differ- 
ent X;.) 
(iv) The shear yı = x1 + Xo, Y2 = X2, V3 = X3,.--; 
Yk = Xk. 


The types (i)-Gii) carry rectangles to rectangles and 
approximating sums to approximating sums, which fact 
makes the theorem for these maps an immediate conse- 
quence of the definition of fp dy, dy2 . . . dy,. Thus it 
suffices to prove the theorem for the map (iv). This can 
be done by the following steps which are left to the reader 
to prove: 


(a) If D is a rectangle then f fD) dyı dy2 . . . dy, COn- 
verges (where f is the map (iv)). 

(b) The integral of (a) is equal to fo dx, dXə . . . dXp. 

(c) The formula 


| dda... dye = | dey dv... dey 
f(D) D 


holds whenever D is a finite union of rectangles. 


Chapter6 | Integral Calculus 200 


(d) If fo dx, dx2...dx;, converges to the number V 
and if e > 0 then there exist domains D, D which 


are finite unions of rectangles such that D is con- 


tained in D, such that D is contained in D, and 
such that 


Y- e< | de..diy | di.. da< Vte 
D D 


(e) All sufficiently fine approximating sums to 
frp) dy, dys... dyg are at least V — 2e and at 
most V + 2e where e is arbitrarily small; hence 
fin; dy; dyz . . . dy, converges to V, as was to be 
shown. 


Exercises 1 The number r is defined to be the area of the unit circle 
{x2 + y? < 1}. Find the area of the ellipse 


2 2 
(3) +() =1 a¥0,b#0. 
a b 


2 Find the formula for the volume of the tetrahedron with 
vertices (xo, Yo, Zo), (%1, V1, Z1), (X2, Y2, Z2), (X3, Y3, Z3). 
3 Prove that if D is the ‘unit cube’? {0 < x; < 1; i = 


1,2,...k} then fp dx1 dxə . . . dxn converges to 1. [Give an 
explicit estimate of the error |} (œ) — 1| in terms of |a|.] 


4 Prove that every affine map R* — R* can be written as a 
composition of the types (i)—(iv). 


5 Prove the statements (a)-(e) of the text. 


6.3 


Independence of Parameter The technical difficulties involved in giving a precise 
and the Definition of J, definition of the integral of a 2-form over a surface in 
3-space are just as great as those involved in giving a 
precise definition of the integral of a k-form over a 
k-dimensional manifold in n-space. Therefore, in order. 
to simplify the notation and in order to make the geo- 
metrical ideas involved in the definition as clear as pos- 
sible, the definitions and proofs will be given for the case 
k = 2, n = 3. The definitions and proofs for general k 
and n are identical to these. 
Let w = A dy dz + B dz dx + C dx dy be a continu- 


6.3 | Independence of Parameter and the Definition of fœ 201 


ous 2-form on xyz-space and let S be an oriented surface 
in xyz-space. The greatest difficulty in defining the num- 
ber fs w actually lies in defining precisely what is meant 
by ‘an oriented surface in xyz-space’. This difficulty is 
essentially overcome by the Implicit Function Theorem, 
which makes it possible to define a 2-dimensional differ- 
entiable manifold in xyz-space as a set which locally can 
be described by two independent parameters or by one 
non-singular equation. Since the integral of a 2-form 
over a parameterized surface can be defined (§2.4) to be 
the integral of the pullback over the parameter space, 
this leaves just the following problems: 


How can several local parameterizations of a 2-di- 
mensional differentiable manifold S be put together 
to give a definition of fs w? 

How are orientations to be described? 

How can the number fg w be proved to be indepen- 
dent of the choices of local parameterizations used to 
define it? 


In §2.5 it was shown that when S is the sphere 
{x2 + y? + z? = 1} the integral fs w can be defined in 
various ways by cutting S into several pieces (say into 
two hemispheres), by parameterizing each piece (say by 
stereographic projection of the hemispheres onto disks), 
and by defining Í s w to be the sum of the integrals over 
the pieces. Although this is a natural and simple pro- 
cedure for a particular surface such as the sphere, it is 
extremely difficult to prove that an arbitrary 2-dimen- 
sional differentiable manifold can be cut into simple 
pieces, each of which can be parameterized. Therefore, 
for the case of an arbitrary surface, some other solution 
to the first problem above would be desirable. 

Instead of decomposing S (writing S as a union of 


pieces S; so that | w= >) | o) one can decompose 
S t S; 


w (write w = J wiso that | w = >| ws) If this can 
S a S 


be done in such a way that w = wi + wo + °° + wy, IS 
written as a sum of 2-forms w;, each of which is identi- 
cally zero except on a small portion of S which can be 


parameterized, then fg w can be defined to be > | Wi, 
i JS 


and each integral fs w; can be defined by parameterizing 
that portion of S where w; is not zero. This process is 
very unwieldy from a practical standpoint because the 


Chapter6 | Integral Calculus 


202 


only 2-forms which are identically zero except in a small 
region are very artificial and the integrals fs w,, although 
they are easy to define, are impossible to compute. None- 
theless, from a theoretical standpoint the method of 
decomposing w is much simpler than that of decom- 
posing S. 

The main reason that decomposing w is simpler than 
decomposing S is that the parameterized portions may 
now overlap and in fact should overlap in such a way 
that every point of S is inside one parameterized portion. 
Hence there is no need to cut S, and locally S is simply 
a surface which can be defined by two local parameters. 
Globally, the definition of [sy w makes two further as- 
sumptions about S, namely, that all of S can be covered 
by a finite number of simple parameterized portions, and 
that these portions can be oriented in a consistent way. 
Specifically, it will be assumed that S is a subset of xyz- 
space which is described in the following way: 

(a) There is given a finite number of differentiable 

maps F,, Fo,..., Fy of the uv-plane to xyz-space. 
Each map F; is non-singular of rank 2 at all points 
of the square {|u| < 1, |v] < 1} and carries points 
of this square one-to-one to points of S. The maps 
F; are called ‘charts’. 

(b) For each point P of S there exists at least one 
chart F; such that P lies inside the image of 
{la| < 1,6] < 1} under F; and such that the sur- 
face S near P coincides with the image of F;. That 
is, P = F,(u, 0) where |u| < 1, |t| < 1 and there 
is an e > Osuch that a point Q = (x, y, z) within 
€ of P = (X, , Z) is a point of S if and only if it 
is of the form Q = F;(u, v) for some (u, v) with 
jul < 1, [of < 1. 

The charts F4, Fə, . . . , Fy are consistently oriented 
in the following sense: If P is a point of S which 
lies inside the image of two different charts F;, F; 
(that is, P = F;(ū, 0), P = F,(u, 0) where |u| < 1, 
jo] < 1, u| < 1, |o] < 1) then by the Implicit 
Function Theorem (see Exercise 2) the map 
F;—!oF;:R? — R? is defined near F;~'(P) = 
(u, 0). The charts F,, Fo, ..., Fy are said to be 
consistently oriented if all such maps F;—’ o F; of 
the uv-plane to itself preserve the sign of du dv, in 
other words, if the pullback of dudv under 
F,;—' o F; near (i, 0) is a positive multiple of du dv. 
The same condition can be stated more simply 


(c 


Naw” 


6.3 | Independence of Parameter and the Definition of Í. 52 203 


*/ oosely speaking, ‘compact’ means 
exactly what it means in everyday 
English—ftirmly united, arranged within 
a relatively small space. 


(but less geometrically) as follows: If P is a point 
of S which lies in the image of two different charts 
F;, F; and if w is a 2-form on xyz-space defined at 
P, then the pullback of w under F; at F;~'(P) is a 
positive multiple of the pullback of w under F; at 
F;—'(P). (For the proof of the equivalence of 
these two definitions see Exercise 2.) 


Example 


The sphere {x? + y? + z? = 1} oriented by x dy dz + 
y dz dx + z dx dy can be described by charts in this way. 
To do this it suffices to use the stereographic projection 
of the sphere minus the ‘north pole’ (0, 0, 1) onto the 
uv-plane and the stereographic projection of the sphere 
minus the ‘south pole’ (0, 0, —1) onto the uv-plane. By 
introducing a scale factor in the uv-plane the square 
flu] < 1, |v] < 1} can be. made to parameterize all but 
a small area around the omitted pole and, in particular, 
each of these charts can be made to parameterize an 
entire hemisphere. By orienting each of these two charts 
(e.g. changing u to —u if the orientation is wrong) to 
agree with x dy dz + y dz dx + z dx dy they agree with 
each other and the requirements (a)-(c) are fulfilled. 
More picturesquely, it is useful to think of a geographical 
atlas, which consists of a finite number of maps, oriented 
in a consistent way (the earth as seen from above), such 
that each point of the earth’s surface is pictured in at 
least one map (and not at the edge of the map). 


A surface S in xyz-space which can be described by 
charts F,, Fə, ..., Fy in this way is called a compact, 
oriented, differentiable surface in xyz-space. The word 
‘compact’ is being used here in a technical sense which 
will not be defined (see §9.4 for this definition*); only 
the entire phrase ‘compact, oriented, differentiable sur- 
face’ as defined by (a)-(c) will be used here. The word 
‘oriented’ means geometrically that given three nearby 
non-collinear points PoP P2 of S the rotational direction 
on S which they describe can be classed as ‘positive’ or 
‘negative’ according to whether the corresponding 
points of the uv-plane under some (and hence under any) 
chart F describe a counterclockwise or a clockwise 
direction. However, ‘oriented’, like ‘compact’, will be 
used here only in the entire phrase above and will not 
be defined separately. Two sets of charts F1, Fo,..., Fy 
and Gi, Go, ..., Gy satisfying (a)-(c) will be said to 


Chapter6 | Integral Calculus 


204 


describe the same compact, oriented, differentiable sur- 
face if they describe the same set S, and if at any point P 
of S contained in charts F;, G; the orientations of the 
charts F;, G; agree in the sense of (c). 


Theorem 


Let S be a compact, oriented, differentiable surface in 
xyz-space, and let w = A dy dz + B dz dx + C dx dy 
be a continuous 2-form on xyz-space which is defined 
at all points of S. Then a number fs w depending only 
on w and Scan be defined as follows: Let Fy, Fə, ... , Ey 
be a specific description of S by charts satisfying 
(a)-(c) above. Then there exist continuous 2-forms 
W1, W2, ..., wy defined at all points of S such that 
w = wy + wo +++: + wy and such that w; is zero at 
all points of S other than those which lie in the image 
under F; of a square {|u| < 1 — 6;, |v] < 1 — 6,} con- 
tained in {|u| < 1, |v] < 1} (6; a small positive num- 
ber). Let F*(w;) denote the pullback of w; under the 
map F;. Then the integrals 


| F; (wi) 
(jul S1, lvi <1) 


in which the domain of integration {|u| < 1, |v] < 1} is 
oriented by du dv, are defined as in §2.3. These integrals 


N 
all converge, hence the number >> | F¥(w;) is defined 


(all integrals being over {|u| < 1, |v] < 1} oriented 
du dv). This number can be defined to be fs w because 
it is in fact independent of the choices; that is, if G4, 
Go,..., Gy is another set of charts describing the same 
compact, oriented, differentiable surface S and if 
w = 60, +02 + °° + oy where e; is a continuous 
2-form on S which is zero at all points of S other than 
the image of {|u| < 1 — d;, |v] < 1 — d;} under G; 
(d; a small positive number) then 


N * ud * 
(1) $ [Feo = 5 [ates 


where all integrals are over {|u| < 1, |v] < 1} oriented 
du dv. 


Proof 


The remainder of this section is devoted to the proof of 
this theorem. The first statement to be proved is that if 
F,, Fy,..., Fy and w are given, then there is a decom- 


6.3 | Independence of Parameter and the Definition of f. 32 205 


position w = w1 + w2 + °: + wy as stated in the 
theorem. 

For each point P of S let cp be a continuous function 
on xyz-space such that: (a) all values of cp are > 0, 
(b) the value of cp at P is > 0, and (c) all points of S 
where cp > 0 are inside any chart F; which contains P 
in its inside. More precisely this condition (c) can be 
formulated: If P is the image of a point inside {|u| < 1, 
o| < 1} under the chart F; then there is an e > 0 such 
that all points of S where cp > 0 are contained in the 
image of {|u| < 1 — e, |o] < 1 — e} under F;. It is not 
difficult to prove that for every point P of S there is such 
a function cp [Exercise 1]. 


Lemma 


If such a function cp is chosen for each point P of S 
then it is possible to select a finite number of the func- 
tions Cp, Say C1, Co,..., Cx, such that c4 + cg +--+ 
Cx > Oat all points of S. 


This lemma is a standard application of the Heine- 
Borel Theorem (§9.4). It can be proved as follows: If 
the lemma is false then at least one of the charts Fj, 
F,,..., Fy must have the property that no finite number 
of the c’s can be chosen such that their sum is positive 
on the image of {|u| < 1, |v] < 1} under F; (because 
otherwise there would be a finite number for each F; 
and the sum of all of these for all i would give a finite 
number whose sum was positive on all of S). Choose 
such an F;. Dividing {|u| < 1, |v] < 1} into four quar- 
ters by the lines u = 0,v = 0, it follows in the same way 
that the image of at least one of the four quarters under 
F; has the property that no finite number of c’s can be 
chosen whose sum is positive on this subset of S. Choose 
such a quarter, divide it into quarters, and repeat the 
argument. One obtains in this way a nested sequence of 
squares Rg D Ry D Ra D Rg D: +- Such that: 


(i) Rois {lu] < 1, [o] < 1}. 
(ii) For each n, the square R,4+ 1 is one of the four 
quarters of the square R,. 
(iii) No finite number of the c’s can be chosen such 
that their sum is positive on the image of R, 
under the map F;. 


The squares R„ close down to a single point Pp in the 
uv-plane. Let P = F;:(Po). Then the function cp is posi- 


Chapter6 | Integral Calculus 


206 


tive near P and the condition (ili) is contradicted for 
large n because a single function cp is positive on F;(R,). 
This contradicts the assumption that the lemma was 
false and thereby proves the lemma. 


Using the lemma, let cy, co,..., Cx be a finite collec- 
tion of the c’s such that cy + co +--+ + cx is positive 
at all points of S, and set 


a = — (u = 1,2,..., K). 


Then a, is a continuous function on xyz-space defined at 
all points of S. The a’s have the property that the function 
K 


Š a, is identically 1 on S, for which reason they are 
p=l 

called a continuous partition of unity on S. They can be 
used to effect the desired decomposition w = wı + 
wə + +++ + wy as follows: For each u = 1, 2,...,K 
it is possible to choose an i(u), 1 < i(u) < N such that 
the points of S where a, > 0 are all contained in the 
image of some square {|u| < 1 — e, |v] < 1 — e} under 
the chart Fiu. This follows from the fact that a, is a 
multiple of c,, from the conditions which were imposed 
on the c’s, and from the fact that every P in S is inside 


the image of at least one chart. Now set w; = >) aww. 
i(u)=i 

Then w; is zero at all points of S except those which are 

contained in the image of some square {|u| < 1 — e, 


v| < 1 —e} under F; (i= 1,2,..., N) and wı + 
N 


ws + + oy = >) aw = was desired. 
=l 


The second statement of the theorem to be proved is 
that the integrals f F*(w;) converge. Here the integrands 
are continuous 2-forms and the domain of integration is 
a square {|u| < 1, |v] < 1}. The proof of §2.3 therefore 
proves that these integrals converge except that the uni- 
form continuity of the integrands must be proved. This 
is easily done using a subdivision argument like the one 
used to prove the lemma above; this argument is given 
in §9.4 (Theorem 2). 

The hard part of the theorem is of course the final 
statement (1) that fs w is independent of the choices. 
This will be proved by first reducing it to the following 
simple case and by then proving this case. 


6.3 | Independence of Parameter and the Definition of Í s2 207 


Independence of Parameter 


Let S be a compact, oriented, differentiable surface in 
xyz-space described by charts F1, Fo,..., Fy as above, 
and let w be a continuous 2-form defined at all points 
of S. If two different charts F;, F; both have the property 
that they parameterize the portion of S where w is not 
zero—that is, if w is zero at all points of S which are not 
in the image of {|u| < 1 — e, |v] < 1 — e} under F; and 
if w is zero at all points of S which are not in the image of 
{jul < 1 — e, |v] < 1 — e} under F; (e a small positive 
number)—then 


(2) | Fi(w) = | Fj (o) 


where the integrals are over {|u| < 1, |v] < 1} oriented 
du dv. 


To reduce the general case (1) to the special case (2) 
one can argue as follows: If F,, Fo,..., Fy and G), 
Gə, ..., Gy describe the same S, then the set of all 
N + M charts F,, Fə, ..., Fy, Gi, Go, ..., Gy also 
describes S, in that it satisfies the conditions (a)-(c). The 
proof above therefore gives a continuous partition of 
unity 41, dg,..., ag on S built up out of functions cp 
chosen to have the property that if P is inside the image 
of {|u| < 1, |v] < 1} under any chart F; or G;, then all 
points of S where cp > 0 are contained in the image of 
flu] < 1 — e, |v] < 1 — e, under this chart. This im- 
plies that for every u = 1, 2,..., K there are integers 
i(u), 1 < i(u) < N, and j(u), 1 < j(u) < M, such that 
all points of S where a, > O are contained in the image of 
{lul < 1 — €e, |v] < 1 — e} under Fy, as well as in the 
image of this set under Gj. Now if w = wı + w2 + 
-++ + wy is any decomposition of w as in the theorem, 
then for all i and u 


| Fra = | Feta 


by (2) because a, is zero outside Fiu) and w; is Zero out- 
side F;. Summing over all u gives 


K 
[Fes = 2 | Finaw 


K 
because 2 a, = 1. Summing this over i gives 
u7 


Chapter6 | Integral Calculus 


208 


N % K 
2 fr: (w:) = 2 | Finta) 


N 
because 2 wi = w. Similarly 


1 = 


M % K 
3 | ct, - x | Bla) 


Finally, (2) gives 


| Finta) = ence 


and summing over u gives (1). This completes the reduc- 
tion of the formula (1) to the formula (2). 

The formula (2) will be proved by the same line of 
argument used to prove the lemma above; namely, it 
will be assumed that (2) is false, and this assumption will 
be contradicted by successively dividing the square 
flu] < 1, |v) < 1} into quarters. 

To simplify notation, set F = F; and G = F;. The 
statement to be proved is that if F and G both param- 
eterize the part of S where w # 0, then [F*(w) = 
{G*(@). Assume this is false and let E be a positive 
number such that 


| | F*(w) — | G"(w) 


Let Ro denote the square {|u| < 1, |v] < 1}, let Ro be 
divided into quarters by the lines u = 0, v = O, and let 
(a) be an approximating sum to [{G*(w). (Thus a 
represents a subdivision of {|u| < 1, |v] < 1} into small 
rectangles R;; and a choice of a point P;; in each R;;. 
The sum }>-(a@) contains one term for each rectangle 


> E. 


Ri; namely, A(P;;) times the area of R,; where 


A(u, v) du do = G*(w).) Any term of the sum > (a) 
which is not zero corresponds to a chosen point P,; at 


- which G*(w) is not zero; hence, by the assumption on w, 


it corresponds to a chosen point P;; such that G(P;;) 
lies in the portion of S parameterized by F. Therefore 
F—'[G(P;;)] is defined and must lie in at least one of the 
four quarters of Ro. By moving P;; slightly if necessary— 
which causes an arbitrarily small change in >| (a)—it can 
be assumed that F` +[G(P;;)] lies in just one of the four 
quarters of Ro. In this way, every approximating sum 
> (a) to Í G*(w) differs arbitrarily little from one which 


6.3 | Independence of Parameter and the Definition of J. 3» 209 


falls into four parts >> (a) = } (a) + È (a) + } la) + 
>, 4(a), each part consisting of those terms of }_ (a) for 
which F~'[G(P;,;)] lies in the corresponding quarter of 
Ro. Since Sr, F*(w) can be written as a sum of four 
integrals, one for each quarter of Ro, and since }-(a) > 
JG*@) as |aļ—0, the assumption |{F*(w) — 
[G*()| > E implies that there must be at least one 
quarter of Ro, say Ry, with the following property: For 
every 6 > 0 there is an approximating sum ) (a) to 
{G*() such that |a| < 6 and such that 


| o-z HOES 


where $` (æ) consists of those (non-zero) terms of È (a) 
for which F—'[G(P;,)] is in R,. (Otherwise, for each of 
the four quarters there would be a 6 beyond which this 
was impossible. Let 6 be the smallest of these four values. 
Choose >'(a) such that |a| < 6, such that each of the 
non-zero terms of >> (a) corresponds unambiguously to 
just one quarter of Rọ, and such that ¥ (æ) is close 
enough to f G* (w) that | f F*(w) — }_ (a)| > E. Then the 


assumption on 6 would also give | | F*(w) — È} (a)! < 
E 
4 (5) = E so there is no such ô.) 


Now let R, be divided into quarters. The contention 
is that at least one of these four quarters, say Rə, must 
have the property that for every 6 > 0 there is an ap- 
proximating sum > (a) to f G*(w) such that |a| < 6 and 
such that 


| O- o) > È 


where $` (a) consists of those (non-zero) terms of >> (æ) 
for which F~"[G(P;,)] is in Ro. 
This is proved by noting that if 6 is given then there is 


E 
a >_ (a) such that | F*(w)— dy (o) > 7" By chang- 
Ry 


ing the chosen points P;; slightly it can be assumed that 
for each term of (œ) the corresponding point 
F~'[G(P;;)] lies in only one quarter of Ry. Splitting 
f r, F*(w) into four parts it follows that the desired in- 
equality holds for this $` (œ) on at least one of the four 
quarters of Ry. If the inequality were impossible for 


Chapter6 | Integral Calculus 210 


*/t has not been shown, however, 
that this integral converges. 


sufficiently small 6 on all four quarters the assumption 
on R; would then be contradicted. 

Repeating this process, it follows that there is a nested 
sequence of squares Rp D Ry DD Ra D:::, each 
Square being a quarter of the preceding square, such that 
each R, has the following property: Given 6 > 0 there 
is an approximating sum È (a) to fG*(w) such that 
la| < 6 and such that 


(3) | F*() — X, (a) >% 
Ra 


where ¥,„(a) is the sum of those (non-zero) terms of 
© (a) for which F~'[G(P,;)] is in Ry. It is to be shown 
that this is impossible. 

The sums > °,,(a) are approximating sums to the inte- 
gral* fern F(R,)| G*(w) where G7 'TF(R,,)] denotes the set 
of points in {|u| < 1, |v] < 1} whose images under G 
lie in F(R,). The next step of the proof is to simplify the 
description of G~ '[F(R,)]. 


Lemma 


Let Po be the point of the uv-plane common to all the 
rectangles R, above. Then Po lies inside {|u| < 1, 
jv] < 1}, there is a point P, inside {|u| < 1, |v] < 1} 
such that F(P)) = G(P;), and GT! o F is a well-defined 
differentiable function near Po. 


Proof 


Once the first two statements are proved, the third 
statement is an immediate consequence of the Implicit 
Function Theorem because (x,y,z) = G(u,v) can 
then be solved near F(Po) for (u,v, z) = g(x,y) [or 
(u, y, vV) = g(x, Zz) or (x, u,v) = g(y, Z)] in such a way 
that ‘(x, y, z) = G(u,v) is equivalent to ‘(x, y, Z) is in 
S and (u, v, z) = g(x, yy. Since F(u, v) is in S, the equa- 
tion F(u, 01) = G(u2,02) is therefore solved for 
(ug, v2) as differentiable functions of (u1, v1). 

To prove the first two statements note first that each 
of the squares R, must contain a point Q, such that w 
is not zero at F(Q,), since otherwise F*(w) would be 
identically zero on Ry, >) ,(a) could contain no non-zero 
terms, and (3) would be false. By the assumption on w 
it follows that there is an e such that Q, is inside 
{lul < 1 — e, |v] < 1 — e} for all n. Since Po = lim Q, 


n> 0 


6.3 | Independence of Parameter and the Definition of J. ¿2 211 


it follows that Po is inside {|u] < 1 — e, |v] < 1 — &. 
Moreover, for each Q, there is a unique Q, in {|u| < 
1 — e, |v) < 1 — e such that F(Q,) = G(Q,). By 
repeated subdivision of {|u| < 1, |v] < 1} (or by the 
Bolzano-Weierstrass Theorem of §9.4) it can be shown 
that there is a point P, of {lul < 1 — e, |v] < 1 — e} 
with the property that for every 6 > 0 there are infinitely 
many of the points Q; which lie within 6 of P,. It 
follows, then, that G(P,;) = F(Po) because any point P 
of xyz-space other than F(P ) has the property that 
there is an e > 0 such that only a finite number of the 
points F(Q,) = G(Q,) lie within e of P (because the 
points F(Q,) all lie near F(Po)) and hence G(P,) # P 
unless P = F(P,). This completes the proof of the 
lemma. 


Using the lemma it follows that G~! o F is a differen- 
tiable function of (u,v) defined on all of R, when n is- 
sufficiently large. Let f = GT? o F and A du dv = G*(w) 
so that F* (w) = (Go f)*(w) = f*[G*(w)] = f*[A du dv]. 
The >°,(a) are approximating sums to Í fR, A du dv and 
it is to be shown that 


| f*(A dudo) ~ Dala)| > 2 
Ra 


cannot hold for arbitrarily small |a]. (If it were known 
that the integrals fs r,) A du dv converged this would be 
a matter of proving that fs r,) A du dv = fr, f*(A du dv) 
holds—at least with an error less than const./4” as 
n — co—but it has not been proved that these integrals 
converge and the approximating sums }`„(a) must be 
estimated instead.) 

The final step of the proof is to examine Í r, J *(A du dv) 
and >~,,(a) ‘under a microscope of power 2”:1’. To this 
end, let Py = (uo, Vo), Pı = (u1, V1) and set 


u= uo + > h = 2”(u — uo) 
k n 
b= vo T 95 k = 2°(v — vo) 
on the domain of f, and 


u =u, + 5 p= 2"(u — u) 


v= Ui + 5- q = 2" — vı) 


Chapter6 | Integral Calculus 


212 


on the range of f. The square R, in the uv-plane cor- 
responds to a square in the hk-plane, say R,. The square 
R,, is 2 X 2 and contains the point (0, 0); therefore R,, 
. . . l 
is contained in {|4| < 2, |k| < 2}. Let A, dp dq be E 
times the expression of A du dv in the coordinates (p, q). 


l 
Then >°,(a) is 7 times an approximating sum to 


n 


fika An dp dq and (3) becomes 


(4) J, f*(An dp dg) — Enla)| > E 


where F nla) is an approximating sum to Si &,) An dp dq. 
It is to be shown that this cannot be true for arbitrarily 
small |a| for all n. 

The idea is that ‘under a microscope’ f is nearly affine 
and A, dp dq is nearly constant. If f were actually affine 
and A,dpdq were actually constant, then, since 
f = GT! oF has positive Jacobian by assumption, the 
theorem of §6.2 would assert that as |a| — 0 the approx- 
imating sums x(a) actually converge to Í fn 4 dp dq = 
f a, f*(A dp dq), and (4) would be contradicted. The 
objective is to show that for n large enough, the error in 
this approximation can be made less than E for any 
E> 0. 

Let L denote the map 


p= aih + aok 
q = azh + ak 


(a1, dg, a3, ag constants) which has the same partial 
derivatives as f at Py = (0, 0) (relative to h, k, p, q). It 
was shown in §5.3 that L(h, k) can be made to differ 
from f(h, k) by less than any preassigned e for all (A, k) 
in {|A| < 2,k < 2} by making n sufficiently large (that 
is, by making the scale factors s = 2~” sufficiently 
small). In particular, if R,* is a square slightly larger 
than R,,, say the square with the same center but with the 
side 2 + 2e instead of the side 2, then f(n) is contained 
in L(R,,*) for all sufficiently large n. Similarly, if R,,~ is 
slightly smaller than R,,, then f(R,,) contains L(R,,_). 
The number Í &, f *(An dp dq) will now be compared to 
x(a) in several steps and it will be shown that each of 
these steps can be made small by making n large and |a| 


6.3 | Independence of Parameter and the Definition of fee 213 


Exercises 


y =f(x) f(-x) 


small, hence contradicting (4). In the first place, 
fa, f*(An dp dq) differs arbitrarily little from 


I L*(A(P1) dp dq) 


because these integrands differ arbitrarily little from each 
other throughout R, when n is large. Next, 


I. L"(A(P1) dp dq = | ~ A(Pr) dp dg 
Ry, L(R,) 


by the theorem of §6.2. Next, (a) differs arbitrarily 


little from the corresponding approximating sum Sla) 
to Sic) A(P,) dp dq because the integrands differ by 
arbitrarily little on f(R,) and because the total area of 
the squares specified by œ for which P,; is in f(R,) is 
bounded when |a| is bounded. It remains to show that 
the approximating sums Sy (a) to Iie APD dp dq 
differ arbitrarily little from fu, A(P,) dpdq. If 
A(P,) = 0, then both are zero and this is trivially true; 
otherwise one can divide by A(P,) and it suffices to show 
that the approximating sums to fy r,) dp dg differ arbi- 
trarily little from Jr,~,) dp dq. But if f(R,) is contained 
in L(R;*) and contains L(R7) it follows that as |a| — 0 
the approximating sums to Í fn dp dq all lie between 
ficas dp dq = Jp L*(dpdg) and Sct) dp dq = 
Jat L* (dp dq). Since these two numbers lie arbitrarily 
close to Ja, L*(dp dq) = Í Lk, dp dq the desired con- 
clusion follows. This completes the proof of the theorem. 


1 Prove that there exists a function cp for each P as stated 
in the proof of the theorem. [Remember that cp can be—in 
fact must be—a very artificial function. Use the function f(x) 
which is 0 for x < —1, which is 1 for x > 0, and which is 
1 + x for x between —1 and 0.] Prove that in fact cp can be 
chosen to be differentiable, and that this gives a differentiable 
partition of unity a1, a2,..., ax. This fact is used in the proof 
of Stokes’ Theorem. 


2 Show that if P is inside the image of two different charts 
F;, F; then F; t o F; is a well-defined differentiable function 
of the wv-plane to itself defined near F; !(P). [This is actually 
proved in the text.] Prove that this map F; + o F; has positive 


Chapter6 | Integral Calculus 214 


Manifolds-with-Boundary 
and Stokes’ Theorem 


*This means that the rectangle 
includes its boundary. This ts 
analogous to the definition of a 
closed interval {a < x < b} to be an 
interval which includes its endpoints 
x = a, X = b. Specifically, each Ri 

is to be a set of the form {a < u < b, 
c <v <d} whee—1 <a<b<1 
and—1 <c<dK< 1. 


Jacobian if and only if Fw) is a positive multiple of F* (w) 
for all 2-forms w. [Use the Chain Rule.] 


3 Show that the torus of Exercise 7, §2.5, is a compact, 
oriented, differentiable surface in xyz-space, that is, that it 
can be described by charts F1, Fo,..., Fy satisfying (a)—(c). 
[This is not as simple as one might expect. Three charts can 
be used.] 


6.4 


To state and prove Stokes’ Theorem fas w = fs dw (see 
§3.4) one must first define the integrals it involves. This 
is an extremely difficult definition to make, and even the 
very elaborate definition of §6.3 is not yet adequate 
because it applies only to integrals fs w in which the 
domain of integration S has no boundary ðS. However, 
the needed definitions require only a slight modification 
of the definition of §6.3. 


Definition 


A compact, oriented, differentiable surface-with-boundary 
in xyz-space is a set S which can be described by a finite 
number of oriented charts as follows: 


(a) A finite number of differentiable maps Fy, Fo,..., 
Fy of the uv-plane to xyz-space are given. Each F; 
is One-to-one and non-singular of rank 2 on the 
square {|u| < 1, |v] < 1}. The maps F; are called 
‘charts’. 

(b) For each chart F; there is specified a closed* rec- 
tangle R; in {ul < 1, |v] < 1} such that the 
image of a point of {|u| < 1, |v) < 1} under F; 
is a point of S if and only if the point lies in R;. 

(c) For each P in S there is at least one chart F; such 
that P lies inside the image of {|u| < 1, |v] < 1} 
under F; and such that S near P is F;(R;). That is, 
for each P in S there is ani, 1 < i < N, and an 
e> 0, such that P = F,(a,0) where |u| < 1, 
o| < 1 and such that a point Q in xyz-space 
which lies within e of P lies in S if and only if it 

(d) The orientations of the charts F; agree in the sense 
that if P is a point of S which lies in the image of 
two charts, say F; and F;, and if w is a 2-form on 
xyz-space defined at P, then F*(w) at F;-'(P) is a 
positive multiple of F* (w) at Fy '(P). 


6.4 | Manifolds-with-Boundary and Stokes’ Theorem 215 


If Sis a compact, oriented, differentiable surface-with- 
boundary, then the boundary of S, denoted 0S, consists 
of those points of S which are the image under some F; 
of a point inside {|u| < 1, |v] < 1} which lies on a 
boundary of R;. 

Note that a ‘surface’ in the sense of §6.3 is also a 
‘surface-with-boundary’ in the sense just defined. [Take 
R; to be all of {|u| < 1, |v] < 1} foralli = 1,2,..., N] 
Thus the boundary is optional and a surface-with- 
boundary need not have a boundary. Unfortunately 
there is no accepted terminology which avoids this 
linguistic absurdity. 


Theorem 


Let S be a compact, oriented, differentiable surface-with- 
boundary in xyz-space, and let w = A dy dz + B dz dx + 
C dx dy be a continuous 2-form on xyz-space which is 
defined at all points of S. Then a number f. s w depending 
only on w and S can be defined as follows: Let Fj, Fo, 
..., fy and Ry, Ro,..., Ry be a specific description of 
S by charts F; and rectangles R; satisfying (a)-(d) above. 
Then there exist continuous 2-forms w1, wo,..., wy de- 
fined at all points of S such that w = w1 + wə + 
-+> + wy and such that w; is zero at all points of S other 
than those which lie in the image under F; of a square 
{jul < 1 — ôo] < 1 — 6;} contained in {|u| < 1, 
lo] < 1}. The integrals 


| Flew 


t 


where F;*(w;) is the pullback of w; under F; and where 
R; is oriented by du dv, all converge; their sum depends 
only on w and S, so that the definition 


N 
J o= 5 f F; (wi) 
S i=1 Y Ri 


is valid. Similarly, if w = A dx + B dy + C dzis a con- 
tinuous |-form on xyz-space which is defined at all points 
of ðS, then a number las w depending only on w and S$ 
can be defined as follows: Let Fy, Fo, ..., Fy and 
R,, Ro, ..., Ry be a Specific description of S. Then 
there exist continuous l-forms w1, wo, ..., wy defined 
at all points of ðS such that w = wı + wg +- + on 
and such that w;is zero at all points of dS other than those 
which lie in the image under F; of a square {|u| < 1 — 6;, 


Chapter6 | Integral Calculus 216 


*This is equivalent to saying that the 
3 functions of 2 variables which 
describe F have continuous second 
partial derivatives—see Exercise 2. 


lo] < 1 — 6,} contained in {|u| < 1, |v] < 1}. Each of 


the integrals 
| Fi (wi) 
ðR; 


can be defined as a sum of 4 simple integrals of the form 
f? A(t) dt by orienting each of the 4 sides of ðR; by the 
counterclockwise convention. The sum 


N 
| w = > F;(w;) 
as i=1 JOR; 


is independent of the choices and is therefore a valid 
definition of fas w. 


Proof 


The proof of this theorem is virtually identical to the 
proof of the theorem of §6.3. The decompositions 
w = Wy + wz +- + wy can be accomplished using 
a partition of unity as before, the convergence of the 
integrals Jr, F¥(w;) and Jar, F*(w,;) follows as before, 
and the proof that the definition is independent of the 
choices is reduced as before to the case where the 2-form 
(resp. 1-form) is zero except at points of S (resp. 0S) 
which are contained inside two different charts. The 
only change in the proof that f s w is independent of the 
choices is that the map f = GT t o F is not necessarily 
defined at all points near Po (if F(Po) is on 0S); this 
causes no difficulty because there is still a differentiable 
map f defined near Po which agrees with GT + o F at all 
points of F~'(S). In the proof that fas w is independent 
of the choices it must be shown that if F, G are charts 
in which S corresponds to Rp, Rg respectively then 
locally the map G~' o F, when it is defined, carries the 
sides of Rp inside {|u| < 1, |v] < 1} (if there are any) 
to sides of Rg in a non-singular orientation-preserving 
way. This is not difficult to prove (see Exercise 1). 


Thus integrals over S and 0S are defined in terms of 
integrals over rectangles and their boundaries. Since 
Stokes’ Theorem for rectangles was proved in Chapter 3, 
it is only a short step to the proof of the general Stokes 
theorem. For the sake of simplicity it will be assumed 
that the surface S is twice differentiable, that is, that S 
can be described by charts Fy, Fə, ..., Fy which have 
the property that F*(w) is differentiable whenever w is 
differentiable.* 


6.4 | Manifolds-with-Boundary and Stokes’ Theorem 217 


Stokes’ Theorem 


Let S be a compact, oriented, twice differentiable surface- 
with-boundary in xyz-space, andletw = Adx + Bdy+ 
C dz be a differentiable 1-form defined at all points of S. 


Then 
| wW = | dw). 
as S 


Let charts F1, Fə, . . . , Fy and rectangles R1, Ro,..., RN 
be given describing S as above. Then it is possible to 
write w = wy + wo + ° + wy where w; is a differen- 
tiable 1-form which is zero on S except inside F;(R;). 
This is done by constructing a differentiable partition 
of unity by the method of §6.3; to do this one need only 
begin with differentiable functions cp (see Exercise 1 
of §6.3). l 

Thus it suffices to prove 


J Fie) = | F; (dw) 
OR; Ri 


since the sums over 7 of these numbers are fas w and 
fs dw by the definition of these integrals (dw = dw, + 
dw. + +: + dwy and dw; is zero on S except on 
F;(R;)). But it was proved in §3.2 that 


| rro- | arto 
OR; R; 


provided the 1-form F}*(w;) on the wv-plane is differen- 
tiable. Thus it suffices to show that d[F*(w,;)] = F*(dw;). 
Let 


Proof 


A dx + B dy + Caz, 
dA dx + dB dy + dC dz. 


Wi 


do); 


The Chain Rule implies that the pullback of dA is d of 
R222, pa A, R the pullback of A. (Consider A as a map from xyz-space 
to a line on which the coordinate is A so that dA is the 
pullback of oriented length on the A-line.) Now if D is 
any differentiable function (0-form) on the wv-plane and 
if ø is any differentiable 1-form then d( Do) = dD: o + 
D - do as is easily seen from the definition of the opera- 
tion d. Thus, considering A, dx, B, dy, C, dz as forms on 
the uv-plane, d of the pullback of A dx + B dy + C dz 
is dA dx + dBdy + dCadz + Adidx] + Bdidy] + 


Chapter6 | Integral Calculus 218 


Exercises 


Cd[dz] and the desired formula is reduced to the formula 
d{dx] = 0 (and similarly d[dy] = 0, d[dz] = 0) where dx 
is considered as a 1-form on the wv-plane. The assump- 
tion that F; is twice differentiable implies that dx is 
differentiable; hence, by Stokes’ Formula, 


| d{dx] = | dx 
R ðR 


for all rectangles R. But the Fundamental Theorem 
applied to the 4 integrals of fo r dx gives cancellations at 
the 4 vertices; hence {az dx = 0, hence fr d[dx] = 0 for 
all rectangles R. Dividing by the area of R and letting 
R shrink to a point P gives d[dx] = 0 at P for all P; 
hence d[dx] = 0. This completes the proof of Stokes’ 
Theorem. 


1 Show thatifR = {a < x < b,c < y < d} isa rectangle 
in the xy-plane and if f: R? — R? is a differentiable map, say 


x= filu, v) 
y = flu, v), 
o, , d(x, y) 
defined and orientation preserving JGL v) > 0) near 
u, U 


(u,v) = (0,0), which carries points (u,v) with u < 0 to 
points of R and points (u, v) with u > 0 to points not in R, 
then f carries the line u = 0 near (0, 0) to one of the sides of 
R and carries the orientation dv of u = 0 to the counterclock- 
wise orientation of ôR. [First consider the case where f is 
affine. Then note that if the conditions hold for f then they 
also hold for f under a ‘microscope’.] 


2 Show that a differentiable map 
Yi = fi(X1, X2,...5 Xn) @ = 1,2,...,m) 


for which the functions f; have continuous second partial 
derivatives has the property that the pullback f*(w) of a dif- 
ferentiable k-form w is differentiable. Show that, conversely, 
if the pullback of every differentiable 1-form is differentiable 
then the f; have continuous second partial derivatives. 


3 Show that if Po, Pı, P2 are three non-collinear points in 
xyz-space then the oriented triangle they describe is a com- 
pact, oriented, differentiable surface-with-boundary in xyz- 
space, that is, can be described by charts Fi, Fo, ..., Fw 
satisfying (a)-(d). [The description of this simple surface- 
with-boundary by charts is not very simple.] 


6.5 | General Properties of Integrals 


General Properties of 
Integrals 


*Specifically, each R; is to be a set 
of the form {ai < uy < b}, 


ag < U2 < ba, ..., ak < Uk < by} 
where —1 <a; < b; < 7 fors = 7, 
2, ann, k. 


219 


6.5 


The case k = 2, n = 3 was considered in §6.3 and §6.4 
merely to simplify the notation. The generalization to 
arbitrary k and n presents no additional difficulties. 


Definition 


A compact, oriented, differentiable, k-dimensional mani- 
fold-with-boundary in XıXə ...Xn-space is a subset of 
XıXə2... Xn-Space which can be described by a finite 
number of oriented charts as follows: 


(a) A finite number of differentiable maps Fy, Fo,..., 
Fy Of uiuo... up-space tO X1Xq...Xy-Space are 
given. Each F; is one-to-one and non-singular of 
rank k on the k-dimensional cube {{u,| < 1; 
i = 1,2,...,k}. The maps F; are called ‘charts’. 

(b) For each chart F; there is specified a closed* 
k-dimensional rectangle R; in {\n;| <1; j= 1, 
2,...,k} such that the image of a point of 
{lu;| < 1; 7 = 1,2,...,k} under F; is a point of 
Sif and only if the point lies in R;. 

(c) For each P in S there is an i, 1 < i < N, such 
that P = F;(ūi, ū2,..., 0) where |u| < 1 
(j = 1,2,...,k) and there is an e > 0 such that 
any point Q = (x1, X9,..., Xn) which lies within 
e of P (each of the n coordinates x; of Q differs by 
less than e from the corresponding coordinate of 
P) lies in S if and only if it lies in F,(R;). 

(d) The orientations of the charts F; agree in the sense 
that if P is a point of S which lies in the image of 
two charts, say F; and F;, and if w is a k-form on 
X1X2Q...X,-Space defined at P then F;*(w) at 
F7 *(P) is a positive multiple of F* (w) at F7 '(P). 


Theorem 


If S is a compact, oriented, differentiable, k-dimensional 
manifold-with-boundary in x,xX9...Xn»-Space and if w 
is a continuous k-form on x1X2q...Xn-Space defined at 
all points of S, then a number fs w depending only on w 
and S can be defined in the same way as before. Simi- 
larly, if w is a continuous (k — 1)-form defined at all 
points of ðS then a number fas w depending only on 
w and S can be defined. This second definition depends 
on a convention for orienting the boundary ðR of a 
k-dimensional rectangle R in uyug...ug-Space. This 
convention is that the side u,; = b, of the rectangle 


Chapter6 | Integral Calculus 


220 


{a1 < uy < by, a2 < u2 < ba,..., a, < up < brp} is 
oriented by duz du... du, and the remaining sides are 
oriented accordingly. (The orientation of the remaining 
sides is determined by the convention that the map 
u;—> Uj, U; —> —u,; preserves orientations. Note that if 
k = 1 then Sis a curve, and fas w for w a O-form (func- 
tion) is a sum over the endpoints 0S of S, each endpoint 
being counted with a sign determined by the orientation 
of S.) 


The proof of this theorem is exactly the same as in the 
case n = 3, k = 2 of §6.4. 

A manifold-with-boundary is said to be twice differen- 
tiable if it can be described by charts F; which have the 
property that the pullback of any differentiable form 
under F; is itself differentiable. This is equivalent to 
saying that the first partial derivatives of F; are functions 
which have continuous first partial derivatives. 


Stokes’ Theorem 


Let S be a compact, oriented, twice differentiable, 
k-dimensional manifold-with-boundary in R” and let w 
be a differentiable (k — 1)-form on R” defined at all 


points of S. Then 
| w = | dw 
as S 


where dw is the k-form on R” defined by the formula 


d(A dx, dxo e.. dXk—ı +: *) 
= dA dx,dx>,...dxp_1 +°: 
of Chapter 3. 
(The assumption that S is twice differentiable is not 


actually necessary and is made for the sake of con- 
venience.) 


Proof 


Using a differentiable partition of unity the theorem is 
reduced immediately to the formula 


| Pio) = | F; (dw). 
Ok; Ri 


By examining the definition of the operation d and by 
using the fact that F; is twice differentiable it can be 


6.5 | General Properties of Integrals 


221 


shown that d[F*(w,)] = Fi (dw,). Setting o = F*(w,;) 
this reduces the theorem to 


f o=] a 
aR R 


i.e. to Stokes’ Formula for a ‘rectangle’. This special case 
was proved in Chapter 3. (It suffices to consider the case 
og = A dus duz ... duzy. Then, letting R’ denote the rec- 
tangle which is the projection of R on the usus... up- 
plane, the orientation of ðR is defined in such a way that 


| 0 = | [A(bj, U2, ..., Ux) 
ôk R’ 


— A(aı, U2, . . . , Ux)] dug dug... dup. 


By the Fundamental Theorem this is 


b, | 
| (| 24 (us, us, st) dts) dy di 
r ay ðu 


which, by the formula for a double integral as an iterated 


integral, is 
ðA 
—- du, duo... d - | as 
i uU Aug Uk R 


as was to be shown.) 
In addition to Stokes’ Theorem, the following prop- 
erties of integrals can now be stated and proved: 


I. Linearity in w. If w = aiw; + aw where ai, ao 
are numbers and w1, wə are continuous k-forms on S 


then 
J o= a f wta f os 
S S S 


Il. Decomposition of S. If the (compact, oriented, 
differentiable) k-dimensional manifold (-with-boundary) 
S is divided into two k-dimensional manifolds (compact, 
etc., -with-boundary) S1, S2 by a (k — 1)-dimensional 


manifold, then 
f w = | w + W. 
S Sı So 


III. Orientation. If the orientation of S is reversed, 
then the integral fs w changes sign. (The orientation of 


Chapter6 | Integral Calculus 222 


*The casen = m = k of this theorem 
is known as the formula for change 
of variable in a multiple integral. 
Note that the conditions guarantee 
that F(S) is a compact, oriented, 
differentiable, k-dimensional 
manifold-with-boundary, so that 

fis @ is defined. 


tTaking wọ = 1, this is Stokes’ 
Theorem. On the other hand, setting 
w = w] -w2 IN Stokes’ Theorem and 
using the formula A(w,+w2) = 

dw, -w2 + (—7)*lw] -dwz gives the 
formula for integration by parts. 


integrals is a constant source of aggravation. However, 
without orientations there can be no cancellation on the 
interior boundaries and hence no Stokes’ Theorem or 
Fundamental Theorem of Calculus.) 

IV. Independence of parameter. The formula 


| w = J S“) 
FS) S 


holds whenever S is a compact, oriented, differentiable, 
k-dimensional manifold-with-boundary, f: R” — R” isa 
differentiable map which is one-to-one and non-singular 
of rank k on S, w is a continuous k-form defined at all 
points of f(S), and f(S) is oriented in accord with S.* 

V. Microscope. If M,: R” — R” is the map of 
(hy, ho, ..., An) to (X1, X2, . . . , Xn) defined by 


xi= X+ sh (i=1,2,...,n) 


if S is a compact, oriented, differentiable, k-dimensional 
manifold-with-boundary in A,h2...h,-space, and if w 
is a continuous k-form on x1X2 .. . X,-Space defined near 
(X14, Xop... Xn), then 


. l 7 
lim = wW = @ 
s—0 S” JM,(S) S 


where @ is the constant k-form in h4, A2, . . . , An obtained 
by evaluating w at (X1, X2, . . . , Xn) and changing dx; to 
dh;. (This gives a precise meaning to the statement that 
the integral of a k-form over a small k-dimensional 
manifold is nearly the value of a constant k-form on the 
manifold.) In particular, if Sis the boundary ofa (k + 1)- 
dimensional manifold, then 


. i 
lim = w = 0. 
s30 S” JM 48) 


VI. Integration by parts. If w, is a differentiable kı- 
form, if ws is a differentiable k-form, and if S 1s a com- 
pact, oriented, twice-differentiable (kı + k2 + 1)-di- 
mensional manifold-with-boundary such that w1, we are 
defined at all points of S then 


[ose - | W1 * Wo (D f oor dont 
S ðS S 


6.5 | General Properties of Integrals 


Exercises 


223 


VII. Approximating sums. If S 1s a compact, oriented, 
differentiable, 2-dimensional manifold-with-boundary in 
the xy-plane, if the orientation of S agrees with dx dy at 
all points, and if w = A dx dy is a continuous 2-form 
defined at all points of S, then the definition of fs w 
given above coincides with the definition given in §2.3. 
That is, if S is enclosed in a rectangle (which is possible 
by compactness—see §9.4) and if approximating sums 
> (a) are formed as in §2.3, then the approximating 
sums converge to the number f s w defined in this chap- 
ter. Analogous theorems apply to integrals f s w in which 
S is a k-dimensional manifold-with-boundary in a k-di- 
mensional space. 


Proofs 


I, II, IV, V and VI follow immediately from the preced- 
ing theorems. II requires a more rigorous formulation 
of the manner in which S is divided into Sı U S» and 
will be omitted. To prove VII one can first write 
w = Wy + wə + +° + wy where w; is zero outside of 
one chart; it is then essentially the statement that fw; 
can be computed using either of two charts, which was 
proved in §6.3. 


1 Show that if S is a compact, oriented, differentiable, 
k-dimensional manifold-with-boundary with the additional 
property that for each chart F; the rectangle R; has at most 
one side inside {|u;| < 1; A |= 1,2,...,k}, then OS is a 
compact, oriented, differentiable, (k — 1)-dimensional mani- 
fold-with-boundary (although 0S has no boundary) and can 
be oriented so that the two definitions of fas w agree. 


2 The integral fər (4 dx + Bdy) of a 1-form over the 
boundary of a rectangle {a < x < b,c < y < d} can be 
defined explicitly by the formula 


d b 
[ (A dx + Bdy) = | [B(b, y) — Bla, yy] dy + | [A(x, c) — A(x, dì) dx. 


Give the analogous formula for the integral of a (k — 1)-form 
over the boundary of a k-dimensional rectangle. 


Chapter6 | Integral Calculus 224 


Integrals as Functions of S 


6.6 


A useful insight into the meaning of fs w can be obtained 
by considering it as a function of S for fixed w. 


Definition 


A k-integral on R” is a function assigning numbers to 
compact, oriented, differentiable, k-dimensional mani- 
folds-with-boundary in R”, which is of the form S — fs w 
for some continuous k-form w on R”. 


By property V of §6.5 the integrand w is determined 
by the values of its integrals fs w. One can think of the 
k-form w = Adx,dx2...dx,+:°°* as giving the 
value of the k-integral fs w on ‘infinitesimal’ oriented 
rectangles—for instance, as A(P) times the oriented area 
of S when S is an infinitesimal rectangle in the 
X1Xq...X,-direction at the point P. These values are 
then sufficient to determine all values of {sw by a 
process of integration. 

From this point of view, Stokes’ Theorem says essen- 
tially that if w is a differentiable k-form, then the function 
S— Jas w is a (k + l)-integral. The integrand dw 
which gives this integral is found by examining the value 
of S — fas w on ‘infinitesimal’ rectangles S—essentially 
a process of differentiation. 

Similarly, the independence of parameter of fs w can 
be regarded as the statement that if f: R” — R” is non- 
singular of rank n then the function S— fy.5)w is a 
k-integral. The integrand f*(w) which gives this integral 
is found by examining the value of S > frs) w on infin- 
itesimal rectangles. For infinitesimal rectangles, f is an 
affine map and w is a constant form, so f*(w) can be 
found by the algebraic methods of Chapter 4. 

The elaborate definitions of this chapter are necessary 
in order to define the domain of a k-integral S — fs w— 
that is, to define the category of k-dimensional domains 
S for which integrals fs w are defined. It is intuitively 
rather clear what sorts of domains S are to be considered 
as domains of integration, so that the essential aspects 
of the theory of k-integrals can be developed, as they 
were in the preceding chapters, without first defining 
precisely the category of compact, oriented, differenti- 
able, k-dimensional manifolds-with-boundary. 

In most applications it is as a k-integral that fsw 
arises; that is, w is fixed and fs w is considered as a 


6.6 | Integrals as Functions of S 


*However, physicists also consider 
other sorts of ‘fields’ which are not 
k-forms. 


Exercises 


225 


function of S. For example, ‘work’ is a 1-integral assign- 
ing numbers to oriented curves in xyz-space. The inte- 
grand w = A dx + Bdy + Cdz which describes this 
l-integral (by giving its values on infinitesimal line seg- 
ments in the coordinate directions) is essentially the 
force field. More generally, a k-form w is what physicists 
call a field* (of alternating covariant tensors); the mean- 
ing of the ‘field’ w is that it is an integrand whose integrals 
fsw are physically meaningful quantities. The field 
gives the values of these quantities for infinitesimal 
k-dimensional rectangles in coordinate directions, and 
the values for more general domains S are found by 
integration. 


1 What is the natural definition of a ‘0-integral’? Of a 
‘compact, oriented 0-dimensional manifold’? 


2 Describe ‘mass’ as a 3-integral in which the integrand is 
p dx dy dz where p = density. 


practical 
methods 
of solution 


chapter 7 


7.1 


Successive Approximation Unlike Chapters 4 and 5, which were devoted to state- 

ments about the nature of the solution of an equation 

y = f(x) for x given y, this chapter is devoted to methods 

of actually solving equations y = f(x) for x given y. The 

phrase ‘actually solving’ will be interpreted to mean 

‘describing a numerical process for determining the 

solution to any prescribed degree of accuracy’. Thus the 

equation x? = 2 is not ‘solved’ by writing x = +2, 

but only by giving some method of extracting the root 

x = +1.4142...to any prescribed number of decimal 

places. Defining ‘solution’ in this way 1s tantamount to 

saying that it must take the form of a process of successive 

approximation; that is, it must take the form of a pro- 

*Th __ cedure which enables one to compute a succession of 
e fetter x will denote a point ; ; D 29) (3) : ; 

x= (x Xo. ...,Xq) in Re, which approximate solutions x °, x“, x°°", .. .* which give the 

prevents the use of subscripts to actual solution to a greater and greater degree of 

Genoe more than re -7e accuracy. This first section is devoted to proving the 

superscript notation XV) = (x}4"?, Bs deg ge . 

D, x?) willbe used Elimination Theorem by such a method of successive 

instead. approximation. Recall the statement of the theorem: 


Elimination Theorem 


If y = f(x1, X2, .. . , Xn) is a differentiable function and 


ð 
if P = f(%1, X2,..., Xn) is a point where 7 ~ 0 then 
1 


226 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhauser Classics, 
DOI 10.1007/978-0-8176-8412-9_7, © Harold M. Edwards 2014 


7.7 


|} Successive Approximation 


227 


there is a differentiable function g(y, X2, X3,..., Xn) 
defined near (P, Xo, X3, . . . , Xn) such that the relations 


y = f (Xi, X2... Xn) X1 = 2(y, Xo... , Xn) 


are equivalent near (F, X1, Xo,..., Xn). 


Proof 
oy - 

Let a be the value of ax, at (Xi, Xo,..., Xn). The 
Xı 


successive approximations will be based on the estimate 


(1 ) Ay ~ a AXi 
of the change in y which results from changing xı and 
keeping Xo, X3,..., Xn fixed. Since a # 0 by assumption, 
this gives 

A 
(2) Ax, ~ = 


as an approximation to the Ax, which will produce a 
given Ay. Thus if (y, X2, X3,..., Xn) are given and if 
x) is an approximate solution of y = f(x1, X9,..., Xn), 
that is, if y ~ f(x’, xo, x3,..., Xn), then the formula 
(2) says that to produce 


desired Ay = y — f(x}, xo, X3, ..., Xn) 


one should use approximately 


N 
y— fx X95 X35 0065 Xn), 


Ax, ~ 
1 a 


That is, the ‘correction’? of the approximate solution 
xM should be 


N 
(3) x NT) — x + ye fS ’ 2 > Xn) 


a 


The given (y, X2, X3,..., Xn) will be assumed to be near 


(F, X2, X3, ..., Xn) SO a ‘zeroth approximation’ to a 
solution x; of y = f(x1, X2, .. ., Xn) would be 
(4) x0) = X4- 


The main step in the proof of the Elimination Theorem 
is the proof of the statement that if the given (y, X2, X3, 
.- - Xn) lie sufficiently near to (J, X2,X3,.--, Xn) then 
the sequence of successive approximations (3) with the 


Chapter7 | Practical Methods of Solution 228 


initial approximation (4) converges, i.e. lim x\*? exists, 


N-x 


and the limit of the sequence is a solution x of the equation 
y = f(x), X2,..., Xn). It will then be shown that the 
function g(y, X2, X3,..., Xn) defined by lim x is 


Noo 


differentiable. 

The first step in proving that (3) converges to a solution 
is of course to estimate the error in the approximation 
(1) on which it is based. This is done, as usual, by using 
the Fundamental Theorem to write 


(N41) 3 
y 
Ay = —— dx). 
y Jio Ox, l 


ð 
If A differs from a by less than e at all points of the 
Xı 


interval of integration, then this integral shows that 
Ay lies between (a + e (xt? — x) and (a — 6) 
(xtd — x). Hence 


(5) [Ay — a Ax| < eļAxl 


where Ax is the difference between the two given values 
xt) x“*) and where Ay is the difference between the 
corresponding values f(x, xo, x3,.--,Xn)s SOX”, 
X2, X3,..-, Xn). This estimate, together with (3), gives 


op (NY N-1 
ND — x) = x 4 ¥ — FOS f Xos- ss Xn) N= _ F — f(x , X9,+.+5 Xn) 
1 1 1 rr ) 1 rr | 
N N—1 
— x) — xA D — f(x f XQ, +245 Xn) = fx f Xss., Xn) 
1 1 a —— TO 
A l E _ 
= lax — X = — [Ay — a Ax| < — [x — x D| 
a| |a |a| ` 


*o = rho. In this context it stands 
for ‘ratio’, namely, the ratio of 

(N+ N (N N— 
xt — xq] to [xf — xf, 


which shows that the Nth step is at most €/|a| times as 
long as the (N — l)st step, provided x{, x{¥~ lie in 
the region where the estimate (5) is valid. 

Choose a positive number* p < 1 and set € = plal. 


. IY 
Then there isa ô > O such that A differs from a by less 
XY 
than e at all points (x1, X2,...,Xn) within ô of 
(Xi, X2,...3 Xn); hence (5) applies in this region. If 
(X2, X3,.-+,5 Xn) lies within ô of (Xo, %3,..., Xn) and if 
xM, xP both lie within 6 of X, then the estimate 


(6) [xT _ x) < p|x™ _ xND] 


7.1 


| Successive Approximation 


229 


holds; that is, the size of the step is decreased by a factor 
of p. 

Now let (y, X2, X3, . . . , Xn) be given. Using the initial 
approximation (4) gives 


xP = zı + LE te Bop nn Ho) 


Thus x‘ lies within ô of ¥,; provided the number 


d = |Y LE X2, X3,- -s Xn) 
a 


is less than ô. When this is the case the estimate (6) 


applies to |x{? — x{| and gives 


xP — xil < [x — xf? | + xP — Fl 
< (pe + Wx? — zıl 
= (p + Dad. 


Therefore x{”’ lies within ô of ¥, provided 


(P+ ld < ô. 


When this is the case the estimate (6) applies to 


Ix{® — x{?| and gives 


xf?) — Ral < ff? — xP] + P — x1 
< pix? — xP] + (o + Dd 
< p*|x{? — x1] + (0 + Dd 
= (p? + p + ld. 


Therefore x{® lies within ô of X; provided 


(° + p+ ld < 6. 


Repeating this argument N times, one finds that x 


lies within 6 of ¥, provided 
(oN) + pN~2 +--+ 4+ p? +94 Dd < 6. 


Multiplying by the positive number 1 — p this inequality 
can be restated 


(1 — p™)d < (1 — p)é. 
This holds for all N if (and only if) 


d < (1 — p)ô. 


Chapter7 | Practical Methods of Solution 230 


x} 


(M) _ x) < [x0 


x NED _ | = 


This proves: If (y, xo, X3,..., Xn) are given such that 
Ix; — X| < 8 (i = 2,3,...,) and such that 


y~ SEn Ss X99 e -> Xn) < (1 _ p)ô 
(where 0 < p < land where A is chosen as above when 
p is given) then the sequence x{” defined by (3) with the 
initial approximation (4) lies entirely in the interval 
{|x; — Xıl < ô} and the estimate (6) always applies. 
But the estimate (6) shows that the Cauchy Convergence 
Criterion is satisfied, because for M > N it gives 

_ xd + [iM _ x(M—2) +e |x +D _ x 

< p~! — xO p pT — O H e H Y 1 

= (P! 4 p~? + cee + p~) fis X2... Xn) 


a 
SPN + ph? + + p~ — p) 
= (p — p” jè < p”. 

This not only proves that the sequence x{™ is convergent 
but also gives an explicit estimate of the rate of conver- 
gence: All terms past x‘) differ from x{” by at most 

N 
p” ô. 
Let x = lim x“. Then passing to the limit in (3) 


Now 
gives 
(0) (00) yn SP, XQ, +2039 Xn) 
KO am fr g VLA Xap ss Xn) 
a 
which implies 
o 
y = f(x} ) Xa, X3, +++ Xn) 


That is, lim x is indeed a solution of the given 


N—~o 
equation. Moreover, if X, is any solution of the given 
equation which satisfies |¥, — X,| < ô, then 


— M x 
X, x 4 I (x1 - Xays sss Xn) _ og 
¥ (N) 
z hM L g, LE 1 X2- -s Xn) — Si s Xo, - + +» Xn) 
x} Xi a 
a|~ |a| 
= pixi” — ıl. 


N) 


Thus x+” is nearer to X¥, than x is, which implies 


7.1 | Successive Approximation 231 


that lim x") = %,. [More formally, passing to the 


No 
limit in the inequality above gives |x{° — %,| < 
pix — Xi], (1 — pix? — Zil < O, |x — Xi] < O, 
(=) = 
Xy = X,.] 


In summary, if (y, X2, X3,..., Xn) satisfies 


|x2 — Xal < ô, |x3 — X3| < 6,..., [xn — Xn| < 6 
ly — f(%1, X2, X3... , Xn)| < lal — p)ô 


(which is true for all (y, Xo, X3,...,Xn) near 
(Y, Xo, X3, . . . , Xn)) then there is a unique solution x, in 
the interval {|x; — X,| < 6} of the equation y = 
f(X1, X2... , Xn), and this solution is the limit of the 
. 0 
sequence defined by (3) and (4). Here a is = at 
Xj 
(X1,X2,---, Xn), p is an arbitrary number in the range 


ð 
0 < p < 1, and ô is chosen so that — differs from a by 


Xı 
less than pla| throughout the ‘cube’ {|x; — X,| < ô; 
i= 1,2,...,n}. 
*/n the strict sense that the sequence This defines* a function gly, Xa, X3 ee, Xn) near 


(3), (4) can be used to compute the — _  _ _ . . 
value of g to any prescribed degree (F, X2, X3,..., Xn) with the desired property. It remains 


of accuracy. to show that this function is differentiable. Now 


g0 + shy, X2 + Sho, see s Xn + Shn) — gy, X25 coe , Xn) 
S 


LEP = 21) + OP = x89) + OP PH 


The first term is the principal term, and the remaining 
terms satisfy 


“(i — x{”) + (x — xP) + tae 


l 
SETH Ho — x 
p 


— P 


[first term]. 


The first term is given explicitly by the formula 


xi %1 _ Vt shy — f(%i, Xo + Sho,..., Fn + shn) 
s sa 


— l |i — Aee shanin E a) = Ra Fa], 


a 5 


Chapter7 | Practical Methods of Solution 232 


lim 

3—0 
*Moreover, if f is uniformly 
differentiable in the sense of §9.3 
these estimates prove that this limit 
is approached uniformly ; that is, g is 
uniformly differentiable. 


By making s small, this can be made to differ arbitrarily 
little from 


ay \ ay ay ay 
(22) @ ax, ax: hs ax, 
where the partial derivatives are evaluated at (Xj, Xo, 
..+,Xn). The remaining terms are at most p/(1 — p) 
times this term in absolute value; but p can be made 


arbitrarily small (when 6 is sufficiently small, which 
means s must also be small) so this proves that 


gio + shy, X2 + Sho,..., Xn + sha) — ay, X2,.--5 Xn) 
AY 


exists* and is equal to 


CAE (, yp A 
(2) (m OXo hz OXn hn) 


That is, the partial derivatives of g at (J, X2, X3,..., Xn) 
exist and can be found by implicit differentiation. But if 
(Y, Ža, ¥3,...,%,) is any point at which g is defined, 
then g is the solution of y = f(x;, Xo,..., Xn) near 
Y =f(%i, X2,...,%,) where X, =g(P, X2,...,%,) and, 
by the above argument, the partial derivatives of g at 
(F, ¥2,...,%,,) therefore exist and are equal to 


ayy, (N (av) ,..., (BN (22 
Ox Ox, OX2 Ox, Xn 


where these partial derivatives are evaluated at 
(O, Ža, %,), ¥2,.--, %,). It follows that g is con- 
tinuous (otherwise its partial derivatives could not exist) 
and therefore that its partial derivatives are continuous 
(because they are compositions of continuous functions). 
Therefore g is differentiable and the proof is complete. 


Using the Elimination Theorem, the Implicit Function 
Theorem can now be proved by the method of step-by- 
step elimination as in §5.3. The Implicit Function 
Theorem can also be proved directly by the method of 
successive approximations as follows: 

It suffices to show that r equations in n unknowns 


(7) yi = fi(Xis X2.. ©, Xn) (i = 1,2,...,7) 
can be solved locally for 


Xi = giis- -s Yrs Xr41s-- -s Xn) (i= 1,2,...,7) 


7.1. | Successive Approximation 


*/t would seem more reasonable to 
evaluate the partial derivatives at 

(x, xf, 2. ., XIN) Xe qe eee Xn). 
This is Newton's method (see §7.3); 
it gives a more efficient process 

for solving the equations, but the 
proof of convergence is more difficult. 


233 


provided 
ð 5 se ees JT 
( 8) (y 1, Y2 Y ) 
because the remaining statements of the Implicit Function 


Theorem follow from the Chain Rule. To solve the 
equations (7) near a given point 


Vi = fii X2... Xn) (i= 1,2,...,r) 


at which (8) is satisfied, one can use the method of 


successive approximations as follows: Given (Y1, yo,..., 


Yrs Xr}1s +- -3 Xn) near (Py, 2s.» -srs Xr+41s.- -s Xn) 
and given an approximate solution (x, x99, ..., x”) 
of (7), the approximation can be ‘corrected’ by setting 


r 


(9) ay = Pax, == 1,2,-...9 


jai OX; 


where the partial derivatives are evaluated at* (Xj, 
Xo,.-., Xn), by setting Ay; equal to 


(10) desired Ay; = yi — f(x, Lees x) Xr4iy+++5Xn) 


by solving (9) [which is possible by the assumption (8)] 
to obtain values for Ax; (j = 1, 2,...,7) and by setting 


(11) xD = XW 4 Ax, = (GF = 1,2,...,7). 
Together with the initial approximation 
(12) x)? = X; (j = 1,2,...,7 


the equations (9), (10), (11) define a sequence of succes- 
sive approximations (x, x, .. . , x) to the solution 
of the given equations. When r = | this is precisely the 
sequence (3), (4) used in the proof above. By the same 
method of proof as that which was used above, it can be 
shown that the sequence (x, x”, ... , x) converges 
to a solution (x, x$, ..., x‘) of the given equations 
for (Vis Y2.. -Yrs Xrtiy+++5Xn) sufficiently near 
(V1, P2... Prs Xrtiy+++5%Xn) and that this solution 
depends differentiably on (Y1, Yo, -+ < s Yrs Xr41s ++ +5 Xn): 
The only added complication is that when r > 1, the 
solution of (9) requires a matrix inversion and the 
relation 


[Ay — L(Ax)| < eļAx| 


Chapter7 | Practical Methods of Solution 234 


Exercises 


[where L is the matrix of partial derivatives in (9)] must 
be shown to imply an inequality of the form 


IL~*(Ay) — Ax| < plAx| 


(0 < p < 1) when e is sufficiently small. 


1 Solve y = u? + v? for u given y = 1, v = .01 by setting 
(¥, #,6) = (1,1, 0) and using the successive approximations 
(3), (4) to find u“. What accuracy is guaranteed for this 
answer by the formula [u™ — u®| < p36 of the text? 
Do the same calculations for v = e instead of v = .01 
and compare the result to the Taylor series expansion of the 
function 4/1 — x2. [The terms in «ê agree, but the term of eê 
in u‘? is incorrect.] 


2 The sequence (3), (4) can be seen graphically as follows: 
Let y = f(x) be a real-valued function of one variable, let 
y = f(x) be a given value and let F be a value near y. Draw 
the graph of y = f(x), draw the tangent line to the graph at 
(x, ¥), and draw the horizontal line y = 7. The point (x“, F) 
is the point of intersection of these two lines. To construct the 
point (x), 7), follow the vertical line x = x“ to the point 
(x, f(x), then follow the line parallel to the tangent line 
back to y = jy. In the same way, one goes from (x™’, F) to 
(xN +D >) by first going to the point (x, f(x), and by 
then constructing the line parallel to the original tangent line 
[at (X, )] and following it back to y = F. Show that this 
process indeed corresponds to the formula (2). Draw this 
picture for a few specific functions and show that it converges 
in all reasonable cases. Looked at under a microscope directed 
at the limit point (x, F), the graph of f appears as a (nearly) 
straight line. Let y = b(x — x™) + F be the equation of 
this line; that is, let b = f'(x), and let a be the slope of the 
original tangent at X, i.e. a = f'(x). Draw a picture showing 
the process ‘under a microscope’ as a polygonal path con- 
sisting of vertical lines and lines of slope a bouncing back and 
forth between a horizontal line and a line of slope b. Write 
exact formulas for this process (it is essentially the geometric 
series) and give an exact criterion for its convergence or 
divergence for various values of a, b. Give an estimate then 
of the asymptotic rate of convergence of the iteration (2)— 
that is, the rate at which the error decreases with each step 
after a large number of steps have been taken—in terms of 


S'O) and f'(x). 


3 Show that if f is k times continuously differentiable then 
so is g. [The partial derivatives of f are k — 1 times dif- 


7.2 | Solution of Linear Equations 


Solution of Linear 
Equations 


*Or that L itself is ‘nearly the 
identity’, i.e. Lx ~ X. 


235 


ferentiable and g is continuously differentiable which shows, 
by the explicit expression for the partials of g, that g is twice 
continuously differentiable when k > 1. If k > 2, then the 
partial derivatives of g are twice continuously differentiable 
because they are a composition of twice differentiable func- 
tions; hence g is thrice differentiable. Using the fact that a 
composition of two functions which are j times continuously 
differentiable is itself j times continuously differentiable (prove 
this using the Chain Rule) the process continues inductively 
to show that g is k times continuously differentiable.] 


7.2 


This section deals with the problem of finding n numbers 
(X1, X2, .. . s Xn) Satisfying 


d11X1 + ai12X2 + °°° + AynXn = V1 


21X1 + AgeXe + *** + AnXn = Y2 


anıXı + AnoXe + °° + AnnXn = Yn 


when the n numbers (71, vo,..., yn} and the n X n 
matrix of coefficients (a;;) are given. The letter x will 
denote the n-tuple x = (x1, X9,..., Xn), the letter y the 
n-tuple y = (v1, Y2, - - - , Yn) and the letter L then X n 
matrix of coefficients (a;;), so that the given equations 
can be written simply Lx = y. Given L and y, the prob- 
lem is to find x. 

Virtually all procedures for solving Lx = y assume 
that an approximate inverse to L can be found*; that is, 
they assume that an n X n matrix M can be found such 
that for all x the approximation MLx ~ xis true in some 
sense. If the exact inverse M = L~! can be found, then 
of course the problem is completely solved by setting 
x = L~'y. Normally it is not feasible to find an exact 
inverse, however, and in fact the necessity of rounding 
makes it virtually impossible to find an exact inverse in 
most cases. On the other hand, it is often not difficult to 
find an approximate inverse. In many cases which occur 
in practice the mapping L can be well approximated by a 
mapping whose inverse is known explicitly—that is, L 
can be regarded as a perturbation of an explicitly in- 
vertible mapping—and in any case a process of step-by- 
step elimination can be carried out with some degree of 
approximation (often a very crude approximation will 
suffice) to give an approximate inverse M. 


Chapter7 | Practical Methods of Solution 236 


Assume therefore that a matrix M has been found such 
that MLx ~ x. There are two basic techniques for con- 
structing successive approximations to a solution of 
Lx = y, namely: 

(a) Correction of x™. If x™ is an approximate 

solution Lx? ~ y then the method of §7.1 gives 


desired Ay = Li(desired Ax) 
M(desired Ay) = ML(desired Ax) 
~ desired Ax 
desired Ax ~ M(y — Lx”). 


Hence one defines 
(1) x NtD — x) + M(y _ Lx™) 


as the next approximation to a solution of Lx = y. 


(b) Correction of My. If My is an approximate inverse 
MyL ~ I (where J is the identity map) then 


MyL — I~ 0 

(MyL — 12 ~ 0 

MyLMyL — 2MyL + 1~0 
[2My — MyLMy]L ~ I 


hence one defines 
(2) Mn41 = 2Mn — MyLMn 


as the next approximation toa solution of ML = I. 
(if ML — Tis ‘small’ in some sense, then its square 
is ‘smaller’.) 


Clearly the correction (2) of M, which requires two 
matrix multiplications, involves much more computation 
than the correction (1) of x. On the other hand, if (1) is 
used repeatedly with the same approximation M, then 
one is repeatedly committing the same error, which 
means that the rate of convergence may be poor in 
relation to the amount of arithmetic which is required. 
Thus, as is always the case in large-scale computation, 
judgment must be exercised in choosing a method of 
computation, taking into account the accuracy of the 
data, the degree of accuracy required of the solution, and 
the type of computing machinery being used. 

The procedure (1) is improved considerably by the 


7.2 |] Solution of Linear Equations 


N 
x NED W 


N+1 N 
xp = xP + 


(1) (xND o. 


x\) + ith component of My — ith component of MLx 


237 


following simple observation: The numbers x{**” a 


defined by the equations 


(N) 


The normal procedure for carrying out the computation 
indicated by this formula would be to find xt, 

xy 22, x4 in turn. However, this means that at 
the time iit 0 is being computed, the values x+}, 

xtD xT are already known. Since these new 
values are presumably more accurate than the old values 
x, xf, ..., x, it is only reasonable to use them in 
place of the old values in (1), i.e. to change (1) to 


ith component of My — ith component of ML applied to 
SX NED QD (ND), 

Although the formula (1’) is very clumsy, the computa- 
tion it prescribes is in fact much more simple—as well as 
more accurate—than that prescribed by (1). The method 
it prescribes can be stated: Correct each of the approxima- 
tions Xj, X2,...,Xņn in turn and then begin again 
correcting x1, X2, ..., etc., ad infinitum. More briefly: 
Correct the components of x in cyclic order. 

This algorithm is eminently practical. Given L, y, and 
an approximate inverse M, one computes the n numbers 
My and the n? numbers ML. One then chooses a first 
approximation (x1, X2,..., Xn) to a solution of Lx = y 
on the basis of whatever information is available 
((0, 0,...,0) is the simplest choice) and performs the 
correction 


(1) new x; = x; + ith component of (My — MLx) 


in cyclic order. After performing each correction (1’’) the 
old value of x; is discarded, so that at any time only 
n? + 2n numbers need to be ‘remembered’ (n? + n for 
the statement of the problem and n for the latest ap- 
proximation to the answer). If the approximate inverse M 
is sufficiently good, then the process in fact converges, as 
will be proved below, to a solution of the equation 
Lx = y. In other words, beyond a certain point one will 
find that further corrections do not significantly change 
(X41, X2,... , Xn). This set of numbers is then, except for 
roundoff errors, a solution of the given equation Lx = y. 

This algorithm is rendered even more practical by the 
observation that after the correct values of the x; have 


Chapter7 | Practical Methods of Solution 238 


been obtained to a few significant digits they can be set 
aside and a new process of successive approximation can 
be set up for the remaining digits. This is done merely by 
setting x; = X; + z; where xX; is the approximation 
already obtained and where z; is the remainder to be 
found. Then (1’’) becomes 


X; + new z; = X; + z; + ith component of (My — ML(X + z)). 


*This condition is exactly analogous 

to the condition AY — Ax] < plAx| 
a 

of §7.1 where the ‘approximate 


. 7 
inverse was —. 
a 


Hence 


new Z; = Zz; + ith component of (w — MLz) 


where w = My — MLX. Using this method for comput- 
ing the less significant digits z; of x; eliminates the re- 
dundant effort of computing w = My — MLX to several 
significant digits with each application of (1”). A further 
modification of the method (1’’) which may be useful is 
that the order in which the components x; are corrected 
can be altered from a strict cyclical order if it appears in 
the course of the computation that some components are 
more in need of correction than others. Such a modifica- 
tion of the order of correction is called a relaxation. 

In proving the convergence of the algorithm (1) it is 
useful to notice that it is in fact the algorithm (1) with M 
changed to M’ = (1+ T)~!M where T is the ‘lower 
triangular’ matrix which is equal to ML below the main 
diagonal (see Exercise 6). Thus to find criteria for the 
convergence of (1) it suffices to find criteria for the con- 
vergence of (1) and to apply them to the case where 
M'= (I+ TYM. 

The simplest condition on the approximate inverse M 
which guarantees the convergence of (1) is the condition* 
that the approximation MLx ~ x on which it is based 
satisfy 


(3) |MLx — x| < p|x| 

for some fixed p < 1 where |x| denotes the maximum of 
|x;| for i = 1, 2,...,n (that is, B = |x| describes the 
smallest cube {|x;| < B; i= 1,2,...,n} which con- 


tains x) and where |MLx — x| is defined accordingly. 
If the condition (3) is satisfied, if x‘ is an arbitrary 
n-tuple of numbers, if the sequence x®™ is defined by (1), 
and if X is any solution of LX = y, then 


[xND — ï| — xM — ž + M(Lx — Lx®)| 


W) a] 


< plx žl. 


7.2 | Solution of Linear Equations 


239 


Hence each step reduces the ‘distance’ from x™? to % by 
a factor of p and 


(4) lim x™ = x. 


No 


Thus if there is a solution of Lx = y it is the limit of x’ 


regardless of the choice of x‘. This implies in particular 
that there 1s at most one solution, i.e. L is one-to-one. But, 
by dimensionality, L must therefore be onto. That is to 
say, Lx = y always does have a solution x which is there- 
fore given by (4). This proves that if condition (3) is 


satisfied and if y is given, then the sequence (1) is con- 


vergent and its limit is the unique solution x of the equation 
Lx = y regardless of the choice of the initial approxima- 
tion x. 


The same condition 
(3) (ML — I)x| < plx] (fixed p < 1, all x) 


implies that the sequence My defined by (2) with 
My = M converges to L~*. This is proved by noting 
that 


MyL — I = (My_iL — 1)? = (Myl — 1)* 
= +++ = (ML — D”. 


Hence for any y the solution x of Lx = y guaranteed by 
(3) satisfies 


\Myy — x| = |(MywL — Dx| < p?"|x\, 


and hence lim Myy = x = L~'y. This holds for all y, 


No 
which is precisely the meaning of the statement that 
lim M N= Lo, 
N- 0 
It is easily seen that the condition (3) is satisfied if (and 
only if) the matrix M is such that the matrix (b;;) = 


ML — I satisfies 


(5) _ max ps bal! <1. 
t=1,2,....m (j=l 

For a given matrix M it is a simple matter to check 
whether or not this condition is satisfied. Thus (5) is an 
easily verifiable condition which is sufficient for the 
convergence of the processes (1) and (2). Other sufficient 
conditions for the convergence of (1) and (2) can be 
obtained by observing that if |x| denotes any norm (see 
§9.8) then condition (3) guarantees the convergence of 


Chapter7 | Practical Methods of Solution 240 


Exercises 


(1) and (2) by exactly the same proof (see Exercises 7, 8). 
A necessary and sufficient condition for convergence is 
that 


(6) lim (ML — D >0 
Now 


where by convergence of (1) one means regardless of the 
choice of x (since obviously (1) converges if x® is a 
solution to begin with, no matter what M might be). 
However, the condition (6) is usually difficult to affirm 
or deny and the condition (3) is more useful in practice, 
even though convergence can still occur when it is not 
fulfilled. 


1 Simple iteration. A simple iterative procedure for solving 
> aijxj = y; is given by moving all but the diagonal term to 
the right and setting 
aux T = yi — Do aix$”. 
ji 
This gives n equations, one for each i, defining x" + in terms 
of x). Find an ‘approximate inverse’ M of L = (a;;) such 


that this process is the process (1). For which matrices L is the 
condition (5) satisfied ? 


2 Gauss-Seidel iteration. A more elementary method than 
that of simple iteration is the corresponding iteration (1’) Le. 


N+1 N+1 (N) 
asx’ = Yi — Dy) ayx tD L D aux . 
j<i j>i 
What general characteristics should the matrix L = (a,;) have 
in order for this iteration to converge rapidly? Note that in 
this case the condition (5) is more difficult to state explicitly 


in terms of L. Which method, this one or that of Exercise 1, 
would you expect to converge more rapidly and why? 


3 Solve the system 
10x — y+ 2z= il 


x—-—9y+ z 
—3x + 4y + 34z = 15 


Il 
| 
nn 


by explicit elimination (expressing the answer first in terms of 
rational numbers, then as a decimal fraction) by simple 
iteration, and by Gauss-Seidel iteration. Use a desk calculator 
if possible. 


7.2 | Solution of Linear Equations 


241 
4 Set 
To 0 0 
Mo =| 0 —4 0 
0 0 3a 


and use the iteration (2) to find an approximation to L~} 
where L is the matrix of the preceding exercise. Use a desk 
calculator. Using the answer, solve the system of Exercise 3. 


5 Having found the solution of the system of Exercise 3 to 
three places by the Gauss-Seidel iteration, set aside these 
three places by the method suggested in the text and set up a 
new iteration for the higher-order decimals. 


6 Show that the method (1’) of the text is the same as the 
method (1) with M changed to M’ = (I + T)~!M where T 
is the matrix whose entry in the ith row and the jth column is 
the corresponding entry of ML if j < i and is zero ifj > i. 
[Add and subtract Tx“ on the right and add Tx“¥+) on 
both sides.] Show that 7+ T)-! = I — T + T? — T8 + 
e.. + T”—t, 
n 


7 If |x| denotes >) |x:| what condition analogous to (5) 
i=1 
guarantees (3)? 


n 1/p 
8 If |x| denotes È spl (p > 1) what condition 
i=l 

analogous to (5) guarantees (3)? [Use the Hölder inequality, 
let um; be the ‘q-norm’ of the ith row, uw; = |2 bal |" a 


J 
where (1 — q)(1 — p) = 1, and let p be the ‘p-norm’ of the 
ws, p = [X u;”]"?.] 


9 Show that the condition (3) implies that |x — z| < 


i; [x() — x(®| where £ is the limit of (1). In what way is 


this estimate more useful than the simpler estimate 
x — z| < px — z|? 


10 Prove directly from the condition (3) that L is one-to-one 
and onto. [This is easy.] Prove directly from condition (3) that 
the sequence (1) converges and that its limit is a solution. 
[Follow the method of §7.1. It must be shown that M is 
one-to-one and onto if (3) holds.] 


11 Let Ay be a sequence of n X n matrices and let B be an 
n X n matrix. Show that lim Ayx = Bx for all x if and only 


N0 
if all entries of Ay converge to the corresponding entries of B 
as N> oœ. 


12 Set up and solve several systems of linear equations 
choosing systems of a size and complexity suitable to the 
computing machinery at your disposal. 


Chapter7 | Practical Methods of Solution 242 


Newton's Method 


13 Prove that the Gauss-Seidel method (Exercise 2) con- 
verges to a solution of Lx = y regardless of the choice of x‘ 
whenever L is a positive definite symmetric matrix, i.e. 
L = (aj) where a; = aj and where the quadratic form 
QO(x1, X2,...,Xn) = > aiyjxix; is positive except when 
X1 = X2 =°++: =x, = Q. [Let X be the solution of LX = y. 
Each ‘correction’ of the Gauss-Seidel procedure moves x 
along a line parallel to a coordinate axis to the point on that 
line where Q(x — X) has its minimum. Therefore the value of 
Q(x — X) is non-increasing as the process continues. If 
Q(x — xX) does not decrease during a complete cycle of n steps 
then x must be x. By Exercise 6 there is a matrix N such that a 
complete cycle of n steps carries x to * + N(x — xX). The 
function O[N(x — X)] must have a maximum on the ‘ellipsoid’ 
Q(x — X) = 1 (see §9.4) and, by the above, this maximum is 
less than 1. Call it pọ. Then a complete cycle of n corrections 
decreases Q(x — X) by a factor of p so that Q(x — X) 0 
which implies x — X as desired.] 


7.3 


In §7.1 the equation y = f(x) was solved for x near X 
given y near p = f(X) by taking x” = X as the initial 
approximation and by obtaining successive ‘corrections’ 
of the approximate solution x“? using 


(1) f(xNTP) — f(x) ~ f(x TP — x™) 


and the desired formula 


f(xNt D) = y 
to obtain 


(2) x Nt) — x™) + y — fx) . 


The approximation (1) is improved by using f’(x‘%’) 


instead of f’(X), which would lead one to believe that the 
method (2) would be improved by changing it to 


(3) x NED = xM yi Sf) . 


f'm) 


This is Newton’s method. 
Similarly, to solve r equations in r unknowns 


Vi = fi(X1, X2... Xr) (i= 1,2,...,7) 


7.3 | Newton's Method 243 


*This is entirely analogous to the 
situation in §7.2, where the method 
(7’) was more accurate and more 
natural than the method (7) but 
could not as easily be proved to be 
convergent. 


Newton’s method is to ‘correct’ an approximate solution 
(X1; X2,..., Xr) in three steps: 


Ay: = yi — filX1, X2, ~~. 5 Xr) 
_ OY; 
Ay; = $e ax; Ax; 


new x; = x; + Ax; 


where the first equation defines the quantities Ay,, the 
second implicitly defines the Ax; (the partial derivatives 
are evaluated at (x1, Xo,..., X,)) and the third defines 
the new x,’s. 

Newton’s method (3) is more accurate and more 
natural than the method (2), but it does not lend itself 
to the purposes of §7.1 because it is not easily proved to 
be convergent.* However, it is easily shown that if 
y = f(x) has a solution, if the initial approximation x‘ 
is sufficiently close to this solution, and if f is ‘reasonable’, 
then the procedure (3) converges very rapidly to the 
solution: Let x denote the solution y = f(X) whose 
existence is assumed. Then the distance of the (N + 1)st 
approximation from X is estimated by 


(N) 
— f(x 
NHD ge MV fx) _ 


= i dx + aa Fed on FQ) ax 


D PE- O 
o fom) 


Thus, if x — ž is small, it follows that xt? —Z is 

doubly small because it is the integral of a smali function 
over a small interval. Specifically, if ‘reasonableness’ of f 
near X is defined to mean that f” exists and is continuous 
and that f’(X) = 0 (otherwise one could set g(x) = f'(x) 
and find x by solving g(x) = 0 using Newton’s method, 
unless g’(X) = f’’(X) = 0, in which case one could set 
g(x) = f’"(x), etc.) and if ‘x sufficiently near £ is 
defined to mean that |x‘? — x| < 8 where 8 is a number 
satisfying 


S'O <B for |x — x| <5 
F'O > A for |x — & <8 


A 
ô < g 


Chapter7 | Practical Methods of Solution 244 


Exercises 


then 


iN) 


POY) = f'l f(x) dx| < Bix — x. 


Hence the integrand in the integral above is bounded by 


B 

ri — x! and the integral itself is bounded by 
B 

7 [x — x%|?, Hence 

(4) KMD — gf < Sx — a)? 


(N+ 1) 


provided |x‘%? — x| < ô. Therefore x is nearer to 


B B 
ž by a factor of 7 jx? — x] < 7 ê< 1. This proves 


not only that (3) converges provided |x” — ž| < 6 but 
also that the convergence is very rapid: The error after 
N + 1 steps is a constant times the square of the error 
after N steps, which means roughly that the number of 
decimal places of accuracy in x“? doubles with each step, 
whereas in the cruder method (2) the number of decimal 
places of accuracy increases by roughly the same amount 
with each step (for instance, if p = .01 it increases by 2). 
Although these estimates assume that the solution X is 
known, the method (3) itself does not. Thus in practice 
one can merely apply (3) and observe whether the 
successive x) are converging; if they are, then (4) can 
be used to estimate the rate of convergence. 

In the case of r equations in r unknowns the same 
argument shows that if x® is sufficiently near to a solu- 
tion X where the Jacobian is not zero and if the functions 
are twice differentiable then 


(4’) xND — žl < const.[x% — xl? 


and convergence 1s again very rapid once it begins. 


1 Show that Newton’s method applied to the equation 


y = x? gives 
(N+1)  4/,™) , VY \. 
x =2 (« 2) 


This method of finding square roots was known and used in 
ancient times. Use it to find \/2 to 11 decimal places. [Use a 
desk calculator if possible.] 


7.44 | Solution of Ordinary Differential Equations 245 


Solution of Ordinary 
Differential Equations 


*Actually the equation dx/dt = f(x) 
should be called a ‘derivative 
equation’, the differential equation 
being dx — f(x) dt = O (see §8.5). 
However, the term ‘differential 
equation’ is universally used in the 
sense of ‘equation involving 
derivatives’. An ordinary differential 
equation is one which involves 
ordinary derivatives, as opposed to 
partial differential equations, which 
involve partial derivatives. 


2 Give Newton’s method for finding ~/y. Find ~/2 to 7 
decimal places. 


3 _ How many decimal places of m must be retained in finding 
a/r to 6 decimal places? Find vr to 6 places. 


4 Find five decimal places of the root of the cubic polynomial 
8x? — 12x — 1 of §5.4, p. 164, which lies between — 1 and 0. 


5 Draw a diagram analogous to that of Exercise 2, §7.1, 
showing the convergence of Newton’s method to a solution 
of y = f(x) where f is a real-valued function of one variable. 


1 
6 Show that if Newton’s method is used to solve y = x 


the result is essentially the iteration (2) of §7.2. Find an 
explicit formula for the Nth approximation to (1 — «)~}, 
beginning with x = 1. [It can be written as a product of N 
simple factors. ] 


7.4 


The method of successive approximations which was 
used to construct a solution of an equation of the form 
y = f(x) can also be used to construct a solution of a 
differential* equation 


(1) o = f(x) 


in which the unknown is a function x(t). 


Theorem 


Let f: R” — R” be a differentiable function defined near 
a point X of R”. Then the differential equation (1) 
together with the initial condition x(0) = X defines a 
unique function x(t) for t near zero and this function 
depends differentiably on X. More specifically, there is a 
number 6 > 0 and a differentiable function x: R — R” 
defined for {|t| < ô such that (x,(0), x2(0),..., 
Xn(0O)) = (X1, X2, ..., Xn) and such that 


i filxy(t), X2(t), . . - 5 Xn(t)) 
(1’) di = fo(xi(t), x2(t), . . . , Xn(t)) 


Ir T fa), x20), - - +5 Xn(0). 


Chapter7 | Practical Methods of Solution 246 


*The method of successive 
approximations was used to prove 
the existence of solutions of 
differential equations by Emile Picard. 
The iterative formula (2) is known as 
Picard'’s iteration. 


Any other function defined on {|t| < 5} and satisfying 
these conditions is identical with x(t) for {|t| < ô}. 


Finally, if this function is written (x1(X1, X2, ..., Xn, L), 
s, Xn(X1, X2, . . ., Xn, t)), making explicit its depen- 
dence on (X1, X2,.-.., Xn), then each x;(X%1, X2,..., Xn, t) 


is a differentiable function of its n + 1 variables. 


Proof 


The unknown in this theorem is a function R — R” 
rather than a point of R” as it was in the preceding 
sections. Let F denote the operation F(x(t)) = 


i — f[x(t)] assigning to each function x: R— R” a 
new function F(x): R— R”. The problem is then to 


find a function x such that F(x) = 0. Thus in the formula 
desired Ay = L(chosen Ax) 


the desired Ay is 


0 — FOX) = _ 


— 4 AO). 


The operation L assigning to each function x: R — R” a 
new function L(x): R— R” should be an invertible 
operation which approximates the operation F. If L is 


oe d 
taken to be ordinary differentiation L(x) = a , then the 


inverse of L is given by the Fundamental Theorem of 
Calculus. Taking all approximate solutions x’ to satisfy 
x0) = X this leads to 


desired Ay = L(chosen Ax) 


am de (N+1) d (N) 
-E t ax = E - S 
ah (N) (N+1) x) 
5 —— (u) + flx” (u)]|du = x (t)-— x (t) 
which gives x+” in terms of x as 


(2) XITA = x + | fix (u)] du. 
0 


Together with the initial approximation 
(3) xO) =F 


the formula* (2) defines an infinite sequence of functions 


7.44 | Solution of Ordinary Differential Equations 247 


*Such a number A exists because a 
continuous function on a cube 
{lx — x| < B} is bounded (see §9.4). 


x(t), each of which is defined on some interval con- 
taining t = 0. (From the fact that x®™ (u) is defined for u 
sufficiently near zero it follows that x+” (u) is defined 
for u sufficiently near zero, hence that x+? (u) is defined 
for u sufficiently near zero, etc.) The main step in the 
proof of the theorem is to show that there isa 6 > 0 such 
that the functions x(t) are all defined for |t| < è, the 
limit lim x(t) = x(t) exists for |t| < 6, the limit 


function (0) is differentiable and satisfies the dif- 
ferential equation (1), and any function X(t) which satisfies 
(1) and x(0) = X must be identical with x“ for |t| < 6. 
It will then be shown that x° (t) depends differentiably 
on X. 

Let B be a number such that f: R” — R” is defined at 
all points x within B of X, i.e. at all points x satisfying 
{|x — x| < B} where |x| denotes max{|x,|, |xo|,..., 
Xn}, and let* A be a number which is larger than all 
values of | f;(x1, X2, - - - , Xn)| at points xin {|x — X| < B} 
for i = 1, 2,...,n. (Intuitively, A is a bound on the 
velocity specified by (1).) Setting 6 = B/A and assuming 
x™ (t) is defined and lies in {|x — X| < B} for |t| < è 
gives 


t 

EtA — zl = J fx (u)] dul < Alt] < B. 
That is, x“ +?(t) is defined and lies in {|x — X| < B}. 
Since x‘° = x is defined and lies in {|x — X| < B} for 
I| < 6, this proves that the entire sequence x‘°(t), 
x (1), xP (H, . . . is defined and lies inside {|x — X| < B} 
for all ¢ in the interval |t| < ô. (Intuitively, a particle 
whose velocity never exceeds A in any direction cannot 
move a distance of B in any direction in less than time 
B/A.) 

The fact that the sequence x, xP (£), xP (t), ... con- 
verges for |t| < 6 is proved, as before, by estimating the 
size of each step |xV*(t) — xM (t)|. First of all 


xP — x (2) — < A. Izl 


J fle) du 


as above. Then 


Pt) = xO) = | J PUN — faU du - 


The integrand is estimated, as in §5.3, by writing 


Chapter 7 | Practical Methods of Solution 248 


*See preceding note. 


x(t) — x?) 


f(x) — f(x) as an integral of the partial derivatives 
of f over a broken line from x‘ to x‘”, which gives 


(4) SEP) — SEA] SKIP? — x 
where* K is a number such that K/n is larger than all 


values of |df;/dx,| at points of {|x — X| < B} for all i, j. 
This gives 


x(t) — xP) < x | |x (u) — x (u)| du 
0 


Similarly 


xPO = xO = J PUN — SU du 


< kf |x” (u) — x” (u)| du 
0 


t 
ul? = 24 HË 
< || xa 5 du = KAz5 
and in general 


6) eG — xy) < KN E. 


Choose an integer J such that J > K-:|¢| and let 


K: , 
p = EE < 1. Then for M > N > J the estimate 


< ETA — eM] tee H O — ATO 


< kra EE g.g gaa HE 

7 (N + 1)! M! 

L KARPE KU RL KU ag AD A. KA 

~ U+ +2 J+3 N+1 J43 743 M 
K'AJ T} N- o.. Mat 

= const. (o~ — p* 7”) < const. p” 


7.4 | Solution of Ordinary Differential Equations 249 


proves that the Cauchy Criterion is satisfied; hence the 
sequence x(t) converges. 

Moreover, since this estimate is uniform for ¢ in the 
interval |z| < 6—that is, given e > 0 there is a N such 
that |x‘ (t) — x™(t)| < e whenever m, n > N and 

*See §9.6 and §9.7. Also Exercise 7. whenever |t| < 6—it follows* not only that the limit 
function x(t) is continuous but also that one can pass 
to the limit under the integral sign in the equation 


xD) = z+ | fix (u)] du 
Oo... 


to obtain the equation 


t 
xX?) = ¥4+ J fix (u)] du. 


This proves that 


dx -lim x(t + h) — xÀ = FIKO]. 
dt h0 h 

Thus x(t) is differentiable and satisfies the differential 
equation (1). 

If X(t) is any other function on {|t| < 6} which 
satisfies the differential equation (1) and satisfies the 
‘initial condition’ x(0) = X, then as long as X(t) remains 
in {|x — x| < B} the estimates 


< Ale 


t 
x(t) — x) = | i F{X(u)] du 


x(t) — xP) 


J (SEUN — SlU) du 


t 
< xf |z(u) — x‘®(u)] du 
0 
I1? 
< K| Aļu| du| = KA =~- 
0 2 
and similarly 
- Wyn) e pig MET 
|x(2) — x (1)| => K A (N+ 1): 


all hold. Hence, x(t) = lim x™ (t) = x(t) for as long 
N—-o 


Chapter7 | Practical Methods of Solution 250 


as X(t) remains in {|x — x| < B}. But since x(t) lies 
inside {|x — x| < B} for |t| < 6 this means ž cannot 
leave {|x — x| < B} without becoming discontinuous. 
Since % is differentiable by assumption, it cannot be 
discontinuous, hence X stays in {|x — x| < B} and 
X(t) = x(t) for all tin {|t| < 6}. (More precisely, X(t) 
lies inside {|x — X| < B} for some open interval of t 
containing t = 0. The largest such open interval must 
contain {|t| < 6} because otherwise the argument above 
would show that it could be extended.) 

It remains only to show that x(t) depends dif- 
ferentiably on x. Now x‘(t)=X clearly depends 
differentiably on x, and if x(t) depends differentiably 
on X then so does 


xU tD) = F + | fix (u)] du 
0 


by differentiation under the integral sign (see Exercise 5, 
§9.4). Thus the functions x(t) all depend differentiably 
on X and the problem is to show that this is also true of 
their limit x(t). Let (h1, ho, ..., An) be a fixed n-tuple, 
and let x“(r) denote the Nth approximation to the 
solution of (1) for the initial condition X + sh where s 1s 
a (small) real number. Then 


s 
N N ð N 
xO — xh? = | 5, Des (1) ds 
0 S 


for all N. The method of proof is to show that the 
ð 

functions P [x?(t)] approach a limit F(s, t) as N > œ, 
s 


that F(s, t) depends continuously on X, t, and that inte- 
gration can be interchanged with passage to the limit as 


N —> o to give 
8 


x1) — xh?) = | F(s, t) ds. 
0 
Then dividing by s and letting s — 0 will give 


ð w 
aye = Fest) 


at s = 0, and the theorem will follow. Thus the essence 
of the proof is to examine the dependence of the functions 


ð 
z; [x(t] on N for large N. 


7.4 | Solution of Ordinary Differential Equations 251 


a 


Exercises 


x _ ð (N) 
ðs 8 8 


Now 


| t 
K J fiz + sh]du 


Ki [ du 


5 a S] — S du 


Jr 
| [ Kh] |u| du 


(N+1) x] < 
Xs 


2 [xP — x7 


lA 


= Kļh] jil. 


2 [x — x7 


A 


0 
(1) xs 4 


Ki. [x; du 


— Kal 5 fej” |? 


lA 


and similarly 


È ix, al le 


Therefore 


m-i _ (7! N Lus 
- < |AIK rv EEN -+ JA]lK 
OSs — I)! 
Since this number can be made small for large N and for 
all M > N (as was shown above), it follows that the 


Ò w .. 
functions z x) approach a limit as N — oo and that 
s 


this limit is approached in such a way (namely, the error 
is uniformly small for all s) that the limit function is 
continuous and that the integral of the limit is the limit 
of the integrals. This completes the proof of the theorem. 


1 The trigonometric functions. Geometrically, a differential 
equation can be imagined as a rule specifying an arrow at each 
point of R” and a solution can be imagined as a parameterized 
curve whose velocity at each point is equal to the specified 
arrow. 


(a) Sketch the arrows in the plane specified by the 
equation 


Chapter 7 | Practical Methods of Solution 


252 


That is, the velocity in the x-direction is minus the 
y-coordinate, and in the y-direction it is plus the 
x-coordinate. 

(b) Prove that if (x(t), y(t)) is any solution of the given 
equation then the value of [x(t)]? + [y(2)]? is con- 
stant. Interpret this geometrically. 

(c) Define the functions cos ¢, sin ¢ by taking them to be 
the two coordinates of that solution of the above 
equation which begins at (1, 0) at time 0. [Hence by 
(b) cos? t + sin? t= 1.] Use Picard’s iteration (2) to 
give a convergent power series representation of 
(cos ft, sin t) valid for all ¢. [Show that x®™ (t) con- 
verges as N — œ for all ¢.] 

(d) Express the solution which begins at a given point 
(x,y) in terms of X, Y, cos ¢, sin £. 

(e) Use (d) and the uniqueness of the solution of a dif- 
ferential equation to prove the addition formula of the 
trigonometric functions, ie. the formulas for 
cos(a + b), sin(a + b). 

(f) Express the addition formula as a statement about the 
product of two matrices of the form 


cosa sina). 
—sina cosa 
(g) Express the addition formulas as De Moivre’s law 


cos(a + b) + isin(a + b) 
= [cosa + isin aļ[cos b + isin b]. 


(h) Prove that (cos ¢, sin £) moves around the circle in 
the monotone way suggested by the differential 
equation. [cos ¢ decreases as long as sin ź > 0 and 
sin ¢ can change sign only when cost = +1.] 

(i) Use (h) to define the number 27; then prove that 7 is 
the area of the disk x? + y? < 1. [Stokes’ theorem.] 

(j) Scaling. In tabulating the functions cos, sin defined 
above there is a great practical advantage in listing the 
values cos ż, sin ¢ for t equal to evenly-spaced rational 
multiples of 27 rather than for t equal to evenly- 
spaced rational numbers. Describe this advantage. 

(k) Assuming that the value of r is known to five decimal 
places 7 ~ 3.14159, estimate sin 1° to four decimal 
places. [Use the power series of part (c).] 

(D In the same way, estimate sin 6° to four decimal 
places. 

(m) Use the power series to estimate cos 6°. 

(n) Use these results and the addition formula to estimate 
sin 12°. Thus trigonometric tables are quite easily 
constructed on the basis of an accurate estimate of r, 
the power series for sin ¢, cos ¢ for small values of £, 
and the addition formulas. 


7.4 | Solution of Ordinary Differential Equations 253 


2 The exponential function. 
(a) Define the exponential function exp(t) = et as the 
solution of a differential equation. 
(b) Prove the ‘addition formula’ exp(x + y) = exp(x) 
exp(y). 
(c) Find the power series expansion as in Exercise 1. 
(d) If the number e is defined to be exp(1) then the 


statement 
. 1\" 
e = lim (: +- n) 
n—>0 n 


becomes a theorem to be proved. Do so. [By the 

Fundamental Theorem of (Calculus exp (5 — 
1 1 1 n 

1>-,1- exp ( - n) < —. Use this to show that 
n n n 


( + ‘) is less than e but that if it is multiplied by 
n 


1, 
1 + — it becomes greater than e.] 
n 


(e) Scaling. In tabulating the function exp(x) there is a 
great practical advantage in first finding the number a 
such that exp(a) = 10 and listing the values of exp(t) 
for t equal to evenly-spaced rational multiples of a. 
Describe this advantage. Such a table 1s called a ‘table 
of anti-logarithms to the base 10’. 

(Ð The number «œ above is called the natural logarithm 
of 10. Given that its value is 2.302585 to six places, 
estimate 101-001 with bounds on the error. [101-001 is 
defined to be the one-thousandth root of 101001, 
Using exp(x + y) = exp(x) exp(y) it is easily ex- 
pressed in terms of exp and a.] 


3 The hyperbolic functions. Consider the solution of the 
differential equation 


dx _ 
dt y 
a _ x 
dt 


satisfying x(0) = 1, y(0) = 0. This solution is (cosh t£, sinh £) 
by definition of these functions, called the hyperbolic func- 
tions. The formula cos? t + sin? ¢ = 1, the addition formulas 
of the trigonometric functions, and the power series expan- 
sions of cos ¢, sin ¢ all have analogs for cosh ¢, sinh ¢. Derive 
these analogs. 


4 The exponential of a 2 X 2 matrix. Show that Picard’s 


Chapter7 | Practical Methods of Solution 254 


method applied to the differential equation 


dx 
a ax + by 
dy 
a cx + dy 
with the initial value (x(0), y(0)) = (x, F) yields the solution 
exp({M *) 
pC (5 


where 
a b 
m- (ea) 
and where exp(tM) is a 2 X 2 matrix defined by an infinite 
series. Let a + bi represent the 2 X 2 matrix 


a —b 

b a 
and find exp(a + bi). (Such a 2 X 2 matrix is called a 
‘complex number'.) Show that exp(zi + z2) = exp(z1) 


exp(z2) for zı, z2 complex numbers. Show, on the other hand, 
that if 


then exp(Mı + M2) # exp(Mı) exp(M2). Prove that if Mı, 
Mz are 2 X 2 matrices such that MıM2 = M2Mı then 
exp(Mı + Mo) = exp(M)) exp(M2). If the binomial theo- 
rem dealt with the functions x”/n! rather than the functions 
x” what would its statement be? 


5 Cauchy’s polygon. The geometrical meaning of a dif- 
ferential equation dx/dt = f(x) can be seen by constructing 
polygonal approximations to the solutions as follows: Given 
a large integer N let P be the given initial point x, let PP 


be the point Pj” + : f(P§”), let PS” be the point P + 
1 
N f(P®), and, in general, let 
1 
Pe) = Pi + g fP 
for integers i > 0. The definition for i < 0 is 
1 
PS = PËR — wi (PS). 
(a) For the equation dx/dt = x, x(0) = 1, plot the points 
(i/N, PP) in the tx-plane, for N = 10 and i = 0, 
+1, +2,...,-+10. Give the exact coordinates of 


these 21 points. 
(b) Plot the 21 points P{” for N = 10, |i| < 10 of the 


7.4 | Solution of Ordinary Differential Equations 255 


*/t can be shown that this ts in fact 

true for an arbitrary equation 

dx/dt = f(x); it was by this method 
that Cauchy proved the existence of 
a solution. 


Cauchy polygon of the equation of Exercise 1 giving 
their coordinates exactly. 

(c) Prove in the case of (a) that as N — © the polygon 
approaches the actual solution.* 


6 Note that the functions x(t) = (t + œ’ and x(t) = 0 all 
satisfy dx/dt = 3x?/3, Plot these curves in the ¢x-plane. Show 
that the equation 


dx _ 4,218 


7; x `, x(0) = —1, 


has infinitely many solutions. [One can stop for an arbitrary 
period of time at x = 0.] Why doesn’t this contradict the 
theorem? 


7 Given that a sequence of continuous functions x(t) 
defined for |z| < 6 has the property that “for every e > 0 
there is an N such that for n,m > Nand |t| < 6 the inequality 
x(t) — x™(r)| < e holds,” show that there is a uniquely 
determined limit function x(t) defined for |t| < 6 and that 
this function is continuous. Show, moreover, that if the 


t 
condition (4) is satisfied then lim i f[x(s)] ds exists, 
0 


N~o 


F [x (s)] ds exists, and the two are equal. 
0 


8 Equations of higher order. A kth order ordinary differential 
equation is an equation of the form 


d'x dx dx 
* a gg He 9 = 
©) F (G3 att dt” =o 


where F is a differentiable function of k + 2 variables. Prove 
that if numbers 


ce x 
x0), Z ~@,§ dx (0), . ae © 

are given which satisfy (*) with ¢ = 0, and if (*) can be solved 
locally near the given point for d*x/dt* as a function of the 
remaining variables, then there is a 6 > O and a curve x(f) 
defined for |z| < 6 such that the equation (*) is satisfied and 
the derivatives d’x/dt? (j = 0,1,...,) have the specified 
values for t = 0. Show, moreover, that any solution of (*) 
with the specified values of d’x/dt'(0) coincides with this one. 
[Introduce new variables yı = dx/dt, yo = d*x/dt?,..., 
yı = d*x/dt* and apply the theorem of the text.] 


9 Combine Exercise 8 and Exercise 1 to give an explicit 
solution of the equation 


Chapter7 | Practical Methods of Solution 256 


Three Global Problems 


where x(t) is a real-valued function. This equation governs 
simple oscillations in which there is a ‘restoring force’ equal 
and opposite to the distance x from equilibrium. 


10 The exponential of ann X n matrix. Generalize Exercise 
4 to n X n matrices M by defining exp(M) and by giving a 
differential equation whose solutions can be expressed in 
terms of exp(tM). Under what circumstances is the formula 
exp(M1) exp(Me2) = exp(Mi + Mə) valid? 


11 Generalizing Exercise 9, express the solution of the 
equation 


dx dy dx 
* s e e a = 
g de T Ort gemi “1g T I~ 0, 


in which a,_1, @p—2,...,@1, @o are numbers and x(f) is a 
real-valued function, in terms of the exponential of a k X k 
matrix. An equation of the form (*) is described in words by 
saying that it is a linear, homogeneous, ordinary differential 
equation of order k with constant coefficients—linear because 
the left side is a linear function of the derivatives, homo- 
geneous because the right side is zero, with constant coefficients 
because the a’s are constant. 


7.5 


This section deals with three specific problems which are 
not of a local nature and which therefore cannot be 
solved by successive approximations. 


Problem 1 


Find r to several decimal places. (By Exercise 1, §7.4, 
this is virtually the same as the problem ‘construct a 
trigonometric table’. Because of the practical value of 
trigonometric tables—in astronomy and navigation, for 
example—the historical importance of this problem is 


very great.) 
By definition, 
T = | dx dy 
D 
where D is the disk {x* + y? < 1} oriented dx dy. By 
Stokes’ Theorem this is 


r=3] (dy — yas) 


which can be written as a single integral by parameteriz- 
ing ðD. A simple way of parameterizing the circle 0D is 


7.5 | Three Global Problems 


(X, Y) 


257 


the following: Given two points (xo, Yo), (xı, ¥1) on 
the circle, consider the point (X, Y) where the tangent 
lines to the circle at (xo, yo) and (xı, yı) intersect. 
The tangent line to the circle x? + y? = const. at 
(Xo, Yo) is xo dx + yo dy = O (that is, xo(x — xo) + 
Yo — Yo) = 0) which gives the equation 


xox + yoy = 1 

for this line. Parametrically it is given by 
X = Xo — tyo 
Y = yo + txo 


(where the sign of t is chosen to agree with the counter- 
clockwise orientation of the circle). If this point also lies 
on the line xıx + yıy = 1 tangent to the circle at 
(x1, Yı) then 


xı(xo — tyo) + Vio + txo) = 1 


from which 
_ L= Xox = yo, 
Xo¥1 — X1Yo 


When (xo, Yo) is fixed this establishes a correspondence 
between points (xı, yı) of the circle (other than 
(x1, ¥1) = +(Xo, Yo)) and real numbers t. To parame- 
terize the circle, that is, to write (xı, yı) in terms of f, it 
suffices to note that by symmetry 


which gives 


C DG) - 


ZTN 
=e 
n—_ 
I 
a a a 
d m 
2> | 
ns 
~ pà 
| 
pd g, 
NLA 
8AT 
<o x 
O oO 
NLA 


Xi = 


_ 2txo + (l — fo. 
Jı = i+? 


Chapter7 | Practical Methods of Solution 258 


A straightforward computation shows that the pullback 
of $[x, dy; — yı dx] under this map (when Xo, yo are 
fixed) is simply (1 + t°) t dt. This proves that the 
integral of $[x dy — y dx] over the arc from (Xo, yo) to 
(X1, ¥1), oriented so that the arc does not pass through 


(—Xo, — Y0), iS 
T 
| dt 
o L+ £ 


_ L = XoXi = Y1, 
XoYı — *1Yo 


where 


If |T| < 1, then this integral can be computed using the 
geometric series 


A€ ee SY pp... 
[pe I CA7! t + ttm t? 


Specifically, 
T 
f af 1- e f “(Py 
o 1+ Jo 1— (=) 1 — (—¢£?) 
_ _ n—1 — 
-f u Pyr ee cartaes Co | Epa 
3 5 
-7r-F 4g ery Fe + R,(T) 
where 


T 2n T 2n+1 
ao 2n [Te 
-j aas if aj =! n+l 


Thus the integral can be computed using the series 


| "dt T? T T 
rn a a 

and the error resulting from taking only a finite number 
of terms is less than the size of the first term omitted. 

If the entire circle is divided into arcs using the points 
(+1, 0), (0, +1), (+2, +2), (+2, +2) then there are 8 
arcs like the arc from (1, 0) to (4, 2) for which 

7_i-1-@®-0-@_1 


rO- ~3 


7.5 | Three Global Problems 259 


and 4 arcs like the arc from (2, 2) to (2, $) for which 


Hence 


1/3 di 1/7 di 
r=8 fatal i+? 
= 81(3) — 36)" + 4G) — ++] 4+ AIG) — 3G)° + 3G) - +d 


Using these series it is easy to find r with 3-place accuracy: 


8 8 8 
3 = 2.6667 3.33 ~ 377 0988 
8 8 8 8 
5.38 © Ps = .0066 7.37 = 15,309 ~ .0005 
8 
9.307 .0000 
4 4 4 
5 5714 3793 = 1039 = .0039 
4 
5045 = .0000 
2.6667 .0988 3.2447 
.0066 .0005 — .1032 
5714 .0039 3.1415 
3.2447 .1032 


Hence r = 3.1411. 

Many more decimal places of + can be found with 
relative ease by using the following decomposition of the 
circle into arcs: The arc from (1,0) to (4%, 35) corre- 


sponds to 
7 L1G) = 0-6) 1 
1-()-0-G3) a 
Twice this arc is very nearly an eighth of the circle 
because (xo, Yo) = G$, #3) and t = ¢ gives 
I~ -u )(2 12 2/2 119 
(*") - L+ 14+ 2]/13] |B 13) ] 13] _ | 169 
yı a li-ėjjs] |5 rj|s5] | 120 
I+? 14+? 13 13 13 13 169 


which is a point just past the point where x = y. Thus 


Chapter7 | Practical Methods of Solution 260 


16 of these arcs cover the entire circle with an excess of 
‘ 120 119 119 120 ‘ 
4 times the arc from (788, 743) to (463, +43) for which 


_ 1- 2438)G38) _ 169)" — 2(119)(120) © 


120)? — (zg)? (120 — 119)(120 + 119) 


169 


_ 28,561 — 28,560 1 


239 239 


1/5 di 1/239 di 
r= rfR-th rir 


Using this formula it is easy to compute 10 decimal places 
of mr. (The fact that 4 = 2-107} simplifies the com- 
putation even further.) 


Thus 


Problem 2 


Find log 10 to several decimal places. [As was seen in 
Exercise 2 of §7.4, this is the essential step in tabulating 
the function 10”, that is, in constructing a table of anti- 
logarithms to the base 10.] 

The basic formulas here are 


elost — x (definition of log x) 
log (xy) = log x + log y (addition formula for e”) 


and 
1+ x\ | xe x? x! 
tog (H z) = ast 545454 l 
The last formula is obtained by setting 


LFX ue) 
l—x 


and by differentiating to obtain 


i ( = ) = ye" = y'@) = 


l1—*x 


yx) = — x 2 _ 2 
i “d—x? 1— x 


v(x) = v(x) — 90) = fe. 


7.5 | Three Global Problems 


261 


The error resulting from setting 


3 2n+1 
we (TES) maet Ht T 


has absolute value at most 


fi y2n dt >l en te Ooo Aget 
| — £? 


(1 — x2)(2n + 1) 
These formulas give an effective means of computing 
logarithms of numbers near 1. The computation of log 10 
then becomes a matter of ingenuity in writing 10 as a 
product of several factors near 1. One way of doing this 
is the following: 


21° = 1024 = 10°(4924 


1012 — 12 | — 53 
108 — 210 _~ = 2"°( i) . 
1012 + 12 l+ 535 


To find the log of 10 it of course suffices to find the log 
of the number on the right in the preceding equation and 
divide by 3. The log of the second factor can be found to 
several places using the series above. For example, to 
five decimal places, 


zs = ,011858 
(s23)? = .000002 

log 10 = 4p log2 — ŝia) + bo +--+] 
19 log2 — 52, + 107° 
= 42 log 2 — .007905+1. 


Now to find log 2 one can write 


6\* 54 
2= ($) 34-28 


log 2 


- (LY 625 _ (12\* (27) (4375 
© \10/ 648 107 \28/ \4374 
11 + 1\* /55 — 1\ (8749 + 1 
11 —1/ \55 + 1/ \8749 — 1 


oe 1. — jt. 
4 log t+ at — log l + 3 + 55 + log I + aras + 8749 
l- 4 l — gs l — g7a9 


for example. Then 42 log 2 can be computed to 6 decimal 
places relatively easily. 


Chapter7 | Practical Methods of Solution 262 


40 ,.({ltm -$i l gpd gad g... 
3 os (++ *) - 3 [0 tyr t Sans t aiy t 
80 80 16 
- 20 , 8 4 = 2.424242 
3-1 T OCTET 5 “eee 
000033 
2.430953 


10 oo (Lt ss) - 211/1 ‘|= 
1—7 + .000013 


121225 
D log | L+ e725 5 = 000762. 
l — fas 

Hence for log 10 

2.430953 .007905 
.000762 121225 
2.431715 .129130 
2.431715 
— .129130 log 10 = 2.30259+1. 

2.302585 

Problem 3 


Prove that Newton’s method converges to a solution of 
y = x” for any positive number y and any positive integer 
n regardless of the choice of the initial approximation 
xo > 0. (This proves in a very constructive way that 
every positive number has an nth root.) 

Newton’s method in this case can be written 


desired Ay = y — xy 
Ay ~ nxy | Ax 


n 
Y XN 


chosen Ax = 7 = on 
nXN 
Xn41 = Xn + ôv. 
Then 
XNa1 = (xn + by)" 
1) n- 
= xh + matty + MOD atag +. 


y+ MD 5 D nap +(5) xy on + °-- 


7.6 | Three Global Problems 


263 


If xy was too small (xy < y) then dy > 0 and x4 Is 
too large. If xy was too large then 


0<y < XN 
-xý <y— xy <0 
—x% < nyx! <0 


—xy < nbn < 0. 


Thus ôy is negative but |5y| < xy/n. In the expansion 


n = fn N—Jod 
XN -Y = DE j XN “ON 


j=2 


it is easily shown that the terms alternate in sign and 
decrease in size. Thus the sign of xvi, — y is the sign 
of the term j = 2 which is +; therefore xy is again 
too large, and 5y 4, is negative. Regardless of the choice 
of xo, then, the first approximation x, and all subsequent 
approximations are too large (x; > x > x3 >°°°). 
Furthermore, the first term in the expansion of xy41 — y 
is larger than the actual value; hence 


[ôN +1] = n—1 > nl on 
XN +1 XN +1 
n — 1 xN*xw41 52 © Ka? 
J y bn < Kon 
where 
n— l xr 
K = ——— i 
2 y 
1 
Now xy = X1+ 6, + 6.+°°::+ iy > 0. 1f [è| > 
fori = 1,2,..., N, then 
N 
x12 r 
N < Kx. 


Therefore, after more than Kx, steps at least one step 
Then [ê| < 


K|6;| |6;| < plê:| where p < 1. Each succeeding step is 
smaller by a factor of at least p, the Cauchy Criterion is 
satisfied, and the sequence converges. 


, l 
must occur in which |6,| < ra 


Chapter7 | Practical Methods of Solution 264 


Exercises 


1 Find vr with 10-place accuracy. 

2 Find log 2 with 8-place accuracy. 

3 Find log 3 with 5-place accuracy. 

4 Show that ifx = i = \/—1 is substituted in the series for 
log (=) , and that if the series is considered as i times the 
integral of 4(x dy — y dx) over an arc, the result is log i = 


x 
i =. Show that this is a true formula if i is taken to be the 


2 
‘complex number’ 
0 —1 
ZU 0 
if log x = y is defined to mean x = e”, and if the exponential 
of a complex number is defined as in Exercise 4, §7.4. 


applications 


chapter & 


8.1 


Vector Calculus The so-called ‘vector calculus’ is not a separate calculus 
at all, but a particular notation for the calculus of three 
variables. Many physical quantities, notably force fields, 
flows, and gradients, are naturally imagined as vector 
fields, that is, as sets of arrows in xyz-space, one arrow 
to describe the quantity (of force, flow, or gradient) at 
each point of space. Analytically, a vector field is written 
Ai + Bj + Ck, where i, j, k are imagined as arrows of 
unit length in the x-, y-, z-directions respectively, and 
where the components A, B, C of the arrow 


Ai + By + Ck 


are functions of (x, y, z). In short, a vector field on xyz- 
space is described by three functions A, B, C on xyz- 
space, A = A(x, y, Zz), etc., and is imagined geometri- 
cally as a field of arrows, of which A, B, C are the 
components. The letter X = Ai + Bj + Ck will denote 
such a vector field on xyz-space. 


Line Integrals 


The integral Íp (A dx + Bdy + C dz) of a 1-form over 
an oriented curve T in xyz-space is also denoted by 
fr X - dx, or some similar notation, where X is the vector 
field Ai + Bj + Ck determined by the 1-form A dx + 
Bdy + C dz, dx is an ‘infinitesimal vector along the 
curve I’, and X dx is the ‘dot product’ of these two 
vectors. In whatever way the ‘infinitesimal vector’ dx is 


265 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhauser Classics, 
DOI 10.1007/978-0-8176-8412-9_8, © Harold M. Edwards 2014 


Chapter8 | Applications 


266 


defined, the net effect is that the symbol fr X - dx means 
fp (A dx + Bdy + Caz). Thus if T is given paramet- 
rically by 


x = f(t) 
(1) y = g(t) (a <t <b) 
z = h(t) 


b 
fX: dx means the number | 4s + Bo + oF dt 


where A, B, C are functions of t by composition with (1). 


Surface Integrals 


The integral fs (A dy dz + B dzdx + Cdxdy) of a 
2-form over an oriented surface S in xyz-space is also 
denoted by fs X -n dø, or some similar notation, where 
X is the vector field Ai + Bj + Ck determined by the 
2-form A dy dz + B dz dx + C dx dy, n the unit vector 
normal to the surface S (the sign of n is determined by 
the orientation of S according to the right hand rule), 
do the area of an ‘infinitesimal element’ of the surface S, 
and X-n denotes the dot product. In whatever way the 
individual symbols n and do are defined, the net effect 
is that the entire symbol fs X -ndo means Ís (A dy dz + 
B dz dx + C dx dy) as defined in §6.3, or, less precisely, 
in §2.2. 


Volume Integrals 


The integral fo F dx dy dz of a 3-form F dx dy dz over a 
three-dimensional domain D in xyz-space is also denoted 
fp FdV where dV is the ‘element of volume’ dV = 
dx dy dz. Normally an orientation of D is not given and 
it is understood that fp F dx dy dz is defined by using 
the standard orientation (dx dy dz positive) for D. 


In this notation the three cases of Stokes’ Theorem 
fag a = fs dw (wa0-, 1-, or 2-form) have very dissimilar 
forms: 

w a 0-form. If w(x, y, Zz) = f(x,y,z) is a function 
(0-form) and if C is an oriented curve with end points 
Ct, C7, then 


KCH- KC) = | X- ds, 
C 


ð ô ð 
where X is the vector field of i+ of j+ of k determined 
Ox oy OZ 


8.1 | Vector Calculus 267 


ð ð ð , 
by the l-form df = Y ax + ay + F ae For this 
reason one defines the gradient of a function f to be the 
vector field 


f 


af. of. af 
df= Zitt jtk 
grad f= itait 


dz ” 
so that the formula becomes 
2) A-N = J grad f- dx. 


wa l-form. if w = A dx + B dy + C dzis the 1-form 
corresponding to the vector field Ai + Bj + Ck, and if 
S is an oriented two-dimensional surface with oriented 
boundary ðS, then the formula is 


| X-dx= | ¥-ndo, 
as S 


where Y is the vector field determined by the 2-form 


dw 


ðA ðA ðB OB ac ðC 


ôC ðB dA OC OB OA 


For this reason one defines the curl of a vector field 
X = Ai + Bj + Ck to be the vector field 


oC ðB, dA C). oB ðA 
curl X = |— — — li + {— —- —]j+(— ——]k 
oy Oz OZ Ox Ox oy 


so that the formula becomes 


(3) | X-dx = |_ (curl x)- n do 
as S 


w a 2-form. If w = A dy dz + Bdzdx + Cdx dy is 
the 2-form corresponding to the vector field Ai + 
Bj + Ck, and if D is a three-dimensional region whose 
boundary ðD is oriented by the usual rules, then 


i x-ndo= | Fay, 
aD D 


, . OA ðB AC , 
where F is the function —- + — + — determined by 
Ox oy OZ 


Chapter8 | Applications 


268 


dA ðB dC 
OX t oy t OZ 
reason, one defines the divergence of a vector field 
X = Ai + Bj + Ck to be the function 


the 3-form dw = ( ) ax dy dz. For this 


. ðA OB ðC 
div X = — + — + 
Ox Oy 02 


so that the formula becomes 


(4) | Xind = | (div Xav 
ðD D 


The formulas (2), (3), (4) are all special cases of the 
generalized Stokes’ Theorem fas w = Js dw. Formula 
(3) is called Stokes’ Theorem (proper). Formula (4) 1s 
called the Divergence Theorem or Gauss’ Theorem. 
Formula (2) has no special name unless perhaps it is 
called the Fundamental Theorem of Calculus. 

The formula for integration by parts 


(5) | oso. - | Wy ° W2 — cyf w1 © dwo 
S 0s S 


(where w, is a j-form, wo a k-form, S a (j+ k + 1)- 
manifold) can also be translated into vector notation, 
and takes various forms for various values of j, k. For 
example, for j = 2, k = 0 it takes the form 


| raivxar = f pxndo~ | grad f- Xav. 
D ðD D 


There are many such formulas, frequently encountered in 
physics books, all of them expressing the single idea (5), 
which itself is just the Fundamental Theorem of Calculus 


f aod = f W1’ We. 
S as 


Much of the literature of mathematical physics is, 
regrettably, written in this vector notation of grad, curl, 
and div, so students of physics must become acquainted 
with it. In summary, grad f, curl X, div X are dw where 
w is a 0-, 1-, 2-form respectively. This notation applies 
only to three dimensions. 


8.1 | Vector Calculus 


269 


Exercises 


adi 
by 
i 


ag 
be 
j 


1 Find the vector field grad f for (a)-(c). 
(a) f= x? +y? +2? 
O) f= l= GF y? 4 22172 
(c) f= log(x? + y?) (independent of z) 


Describe these vector fields geometrically. 


2 Show that the curl of each of the vector fields found in 
Exercise 1 is zero. Find the divergence of each of these vector 
fields. 


3 Find div X and curl X for 


X = xi + yj + zk 
X = (x? + Di + xyzj + sin(x + y)k 


4 Show that the value of a constant 2-form A dy dz + 
B dz dx + C dx dy on a parallelogram in space is X:n do, 
where X = Ai + Bj + Ck, n is the unit vector normal to the 
parallelogram (suitably oriented), and do is the area of the 
parallelogram. [Since n, do are vaguely defined in terms of 
geometrical notions, the proof is necessarily imprecise. Since 
both functions are linear in (4, B, C), it suffices to consider 
the cases ‘X parallel to n’ and ‘X perpendicular to n’, for which 
the desired formula is easily established.] 


5 The ‘cross product? a X b of two vectors a = ai + 
a2j + ask, b = bii + b2j + b3k is defined to be the vector 


a3 
b3| = (a2b3 — azb2)i + (a3b1 — a1b3)j + (aibe — aeb1)k 
k 


where the determinant on the left is merely a mnemonic. The 
dot product of two vectors is defined as the sum of the prod- 
ucts of corresponding components [i.e., a:b = abı + 
a2b2 + a3b3]. Show that the value of A dy dz + B dz dx + 
C dx dy on a parallelogram with sides a, b is X: (a X b) 
where X = Ai + Bj + Ck. This number X-: (a X b) is 
called the ‘scalar triple product’ of X, a, b. 


6 Combining 4 and 5 show that a X b is perpendicular to a 
and b and that its length is equal to the area (which has not 
been mathematically defined) of the parallelogram enclosed 
by a, b. 


7 Instead of fs X-ndo, the integral of a vector field over 
a surface is often denoted 


[ x-aa xan 
S 


Chapter8 | Applications 270 


Elementary Differential 


Equations 


*Apply the theorem of §7.4 to the 


pair of equations dx = 7, Y- F(x, y). 


at 


dt 


where u and v are parameters describing the surface S = 
{x(u, v), y(u, v), z(u, v)}. Justify this notation in terms of 
‘infinitesimal parallelograms’. 


8.2 
Although a differential equation and an initial condition 


(1) o = fy) 0®) = 9) 


determine* y locally as a function of x (see §7.4), it is 
impossible to give a formula for the solution y(x) except 
in certain very simple cases. 

The difficulty of the problem of finding an explicit 
solution y(x) of (1) can be gauged by observing that even 
in the very simple cases where f does not depend on y— 


2 = f(x)—the problem is to ‘find a function with a 
given derivative’. As was mentioned in §3.1 in connec- 
tion with the Fundamental Theorem of Calculus, the 
fact that there exists a solution of this problem in no 
way implies that a formula for the solution can be given 
in terms of known functions. The same remark applies 


a fortiori to the solution of the more general equation 
a = f(x, y). This section deals with a few of the basic 
techniques of finding formulas for solutions of differ- 
ential equations of the type (1). Since it is not necessarily 
true that the solution can be expressed by a formula, 
these techniques obviously cannot be expected to apply 
to all cases, but only to particularly simple equations (1). 
The problem is greatly simplified if one allows the 
solution y(x) of (1) to be given implicitly rather than 
explicitly, that is, if one accepts as a ‘solution’ a relation 
of the form F(x, y) = const. which, when solved for y as 
a function of x, gives a solution of (1). The rule for the 
implicit differentiation of F(x, y ) = const. is (see §5.2) 


F(x,y) = C 


OF oF 


oF 
l Ox 


8.2 | Elementary Differential Equations 


271 
Hence 
OF 
dy _ Ox 
dx OF 
ay 


o. OF 
where it is assumed that op ~ 0 in order that F(x, y) = 
y 


const. can be solved for y as a function of x. Thus 
F(x,y) = C gives a solution of (1) when solved for y 
if and only if at all points of the curve F(x, y) = C the 
l-form dF is a non-zero multiple of the l-form dy — 
f(x, y) dx. This can be seen geometrically as follows: 

The tangent line to the curve F = const. through 
(X, F) 1s the line 


OF _ _ . , OF. =) 
Ax S&P — X) + ay HWY — W) = 0 
which can be remembered simply as dF = 0. If y(x) is a 


function satisfying (1), then the tangent line to its graph 
is the line 


that is, the line 
O — F) — f(X, y(x — X) = 0 


or, mnemonically, the line dy — fdx = 0. Since two 
lines 

A(x — X)+ Biy — y) = 0 

A(x — X) + Boy — y) = 0 


through a point (X, p) coincide if and only if the equa- 
tion of one line is a non-zero multiple of the equation 
of the other line, it follows that the function y(x) defined 


d 
implicitly by F(x, y) = const. satisfies x = f(x, y) 


if and only if the lines dF = 0, dy — fdx = 0 coincide, 
that is, if and only if the 1-form dF is a non-zero multiple 
of dy — f dx at all points of the curve. 

More generally, a curve F(x, y) = const. is said to be 
a solution of the differential equation A dx + B dy = 0, 


-where A, B are functions of x, y, if and only if the 1-form 


dF is a non-zero multiple of the 1-form A dx + B dy at 
all points of the curve. Geometrically this means that at 
each point (X, P) of the curve the tangent line dF = 0 is 


Chapter8 | Applications 


272 


the line A dx + Bdy = 0; that is, for each (X, 7) on 
the curve the equations 


TED- H+ NOY — 7) = 0 
and 
A(X, (x —X)+ BR, — 7) =0 
define the same line through (x, P). 
According to this definition the solution of = 


f(x, y) is equivalent to the solution of dy — f(x, y) dx = 0. 
In the remainder of the section the original equation (1) 
will be dropped and in its place the differential equation 


(2) Adx+ Bdy=0 


will be considered. The problem 1s to find explicit for- 
mulas for functions F(x, y) such that F = const. solves 
(2) in the sense defined above. That is, the problem is to 
find F(x, y) such that dF is a non-zero multiple of the 
given l-form A dx + B dy at each point. 


Example 
e o. ody x... 
Instead of the\‘derivative equation Ix y it is easier to 
x y 
. . . . x 
consider the ‘differential equation’ dy — - dx = 0. 


Multiplying by y this becomes ydy — xdx = 0, or 
diy? — x?] = 0; hence y? — x? = const. gives solu- 
tions of the equation. The factor y is non-zero at all 
points where the given equation is meaningful. By im- 
plicit differentiation the curves y = +v C + x? satisfy 


d 
the original equation P 
dx 


X 
y 
Example 


— 2 
To solve? = —4 f — 3 rewrite it as 
fl — y? 
dy + TE dx = 0 
dy dx 


= 0 


_————— + n 

Vl—=yz2? V1 — x? 
d(Arcsin y) + d(Arcsin x) = 0 
Arcsin y + Arcsin x = const. 


8.2 | Elementary Differential Equations 


273 


Given (x, y) inside the square {|x| < 1, |y| < 1} where 
the equation is defined, the equation 


Arcsin x + Arcsin y = Arcsin ¥ + Arcsin j 


can be solved for y as a function of x for x near X, giving 
a function y(x) which satisfies the given equation and the 
initial condition y(X) = 7. 


If a differential equation can be written in the form 
(3) A(x) dx + Biy)dy = 0 


then it can be solved by applying the techniques of ele- 
mentary calculus to find (if possible) functions g(x), h(y) 
such that g'(x) = A(x), h’(y) = B(y), after which the 
solution is given by g(x) + A(y) = const. An equation 
of the form (3) is said to have variables separated, and a 
differential equation which can be put in the form (3) 
by multiplying by a non-zero function p(x, y) is said to 
have variables separable. 


Example 
The homogeneous first order linear equation dy + 
f(x)y = 0 has variables separable. Writing 
dy + f(x)y dx = 0 
K + f(x)dx 
log |y| + F(x) 


y = const. e 


0 


const. 


—F (2) 


gives the solution for y ¥ 0 if a function F(x) can be 
found such that F’(x) = f(x). The solution for y = 0 
is y = Q. 


A 1-form is said to be exact if it is equal to dF for some 
function F(x, y). Thus the differential equation A dx + 
B dy = 0 is solved by writing the l-form w = A dx + 
B dy as a non-zero multiple of an exact 1-form dF, after 
which F = const. gives the solutions. 

A l-form w is said to be closed if dw = 0. Since 
d{dF] = 0, every exact 1-form is closed. Locally the 
converse is true—every closed form is exact—and, more- 
over, if dw = 0 then a function F such that w = dF can 
be found by ‘integration’, i.e. by antidifferentiation. 


Chapter 8 | Applications 


274 


Example 


The 1-form y dx + (x + y) dy is (locally) exact because 
d[y dx + (x + y)dy] = dy dx + dxdy = 0. To find 
F(x, y) such that dF = ydx + (x + y)dy one can use 


. OF . 
the equation ax y to obtain F(x,y) = xy + CQ), 
where the constant of integration may depend on y. 
OF 
Then ay = x+ y gives C’(y)=y; hence C(y) = 


ty? + const. Thus F(x, y) = xy + 4y? satisfies dF = 
ydx + (x + y)dy and the differential equation 


ydx + (x + y)dy = 0 
is solved by 
xy + dy? = const. 


The same method applies to the solution of any equa- 
tion A dx + Bdy = 0 for which d(A dx + B dy) = Q, 


dA OB 
i.e. for which ay Ox” One finds (if possible) a function 
y x 


oF 
F(x, y) such that = A. Setting F(x, y) = Fo(x, y) + 
x 
OF oF 
C(y) the equation — = B(x, y) becomes —e y C'(y) = 
oy oy 
OF 9 . 4. . 
B, C'(y) = B — a The right side does not involve x 
y . 


ð 
because — E — —| = — — — = 0;hence C’(y) = 
Ox y 


OF 
B — y can be solved (theoretically) for C(y) so that 


F(x, y) satisfies dF = A dx + B dy. 

A differential equation A dx + Bdy = 0 in which 
d(A dx + Bdy) =0 is said to be exact (although it 
would be more consistent to call it ‘closed’). The above 
shows that the solution of an exact differential equation 
can be achieved by solving 
OF 


/ = — —— 


OFo _ 
Ox 


and setting F(x, y) = Fo(x, y) + C(y). Alternatively, 
one can solve 
OF 


ay 7 BOO) = 


_ ôFo 
Ox 


8.2 | Elementary Differential Equations 


275 


and set F(x, y) = Fo(x, y) + C(x). Thus the solution of 
an exact equation requires two antidifferentiations. 


Example 


An equation A(x) dx + B(y) dy = 0 with variables sep- 
arated is exact and the function F(x, y) is found merely 
by antidifferentiating A(x) and B(y) separately as before. 


Example 
— x l 

The 1-form dé = Pay dx + +p dy is closed. 
It is dF for F(x, y) = Arctan(y/x) or Arccot(x/y). 
These functions are not defined for x = 0 or y= 0 
respectively, and there is no function F(x, y) such that 
dF is the given 1-form at all points (x, y) = (0,0). (In 
short, the 1-form d@ is closed and therefore is exact 
locally, but 6 is not a global function of (x, y).) The 
differential equation d@ = 0 is solved by setting Arc- 
tan(y/x) = const., y = Cx, which gives rays through 
(0,0) as expected. The equation dð = 0 can also be 
solved by multiplying by (x? + y”)/xy, which separates 
the variables and gives — Za 2 = 0, —d log |x| + 


dlog |y| = 0, log |y/x| = const., y = const. x. 


Example 
The 1-form ye™” dx + (xe™” + cos y)dy is closed. To 

. OF OF 
find F(x,y) write — = ye, F = e” + C(y), — = 

Ox oy 
xe™” + cosy = xe” + C’(y), Cy) = sin y + const., 
F = e™ + sin y. Thus 
ye™” dx + (xe™” + cos y)dy = 0 

is solved by 


e™ + sin y = const. 


To solve an arbitrary differential equation w = O it 
suffices to find a non-zero function p(x, y) such that pw 
is closed and then to find F satisfying dF = pw by the 
method above. Such a function p is called an ‘integrating 
factor’ for the l-form w because multiplication by the 
factor p makes it possible to ‘integrate’, that is, to find 
an F. 


Chapter8 | Applications 


276 


Exercises 


Example 
. . . dy 
The inhomogeneous first order linear equation ax + 
x 


S(x)y = g(x) always has an integrating factor of the 
form p(x). Writing 


pw = p(x) dy + PS (xy — g(x)] dx 


and setting d(pw) = O gives p’(x) — p(x)f(x) = 0. This 
is a homogeneous first order linear equation for p; the 
solution of this equation was found above to be p(x) = 
const. e*™, where F’(x) = f(x). Thus e”™ dy + 
eF% F(x)y — g(x)] dx is closed and can be integrated, 
that is, can be written as dF for some F(x, y). Then 
F(x, y) = const. solves the given differential equation. 


It is shown in §8.5 that locally there exists an integrat- 
ing factor for any equation A dx + B dy = 0. Specifi- 
cally, it is shown that if A, B are differentiable functions 
defined near (x, F) and if A, B are not both zero at (x, J), 
then there is a differentiable function p defined near 
(x, y) such that d(pA dx + pB dy) = 0 near (x, 7). This 
fact is obviously of no avail, however, in the problem of 
finding a formula for F. The integrating factor p may 
itself be a function for which there is no simple formula; 
moreover, even if there is a simple integrating factor p 
it may be very difficult to find. 

In summary, a differential equation A dx + B dy = 0 
is solved as follows: First find an integrating factor p, 
that is, find a function p such that the equation pA dx + 
pB dy = 0 is exact. If A dx + Bdy = O has variables 
separable or if it is already exact, then such a function p 
can be found immediately. Otherwise it may not be 
possible to find a p even though one always ‘exists’. 
Once p is found, the problem is reduced to solving an 
exact equation A dx + B dy = 0. By two antidifferen- 
tiations, which in simple cases can be done using the 
techniques of elementary calculus, one obtains a function 
F(x,y) such that dF = Adx+ Bdy. The curves 
F = const. are then the solutions of the given equation. 


1 Find functions F(x, y) such that the curves F = const. 
solve the following differential equations. 


8.2 | Elementary Differential Equations 


277 


(c) x” dy + y dx = xy dy 


(d) (2x + y)dx + (x + 2y)dy = 0 
(e) IY cos x = 0 
dx 
(f) ex? + y + 2x) dx + 2ye dy = 0 
d 
(g) m + 2xy = 2x° 
X 


2 A differential equation gives a concise description of a 
family of curves. For example, dx = 0 describes lines parallel 
to the y-axis, x dy — y dx = 0 describes lines through the 
origin, x dx + ydy = 0 describes concentric circles about 
the origin, etc. Find differential equations describing the 
following families of curves. 


(a) hyperbolas whose asymptotes are the lines x = +y 
(b) hyperbolas whose asymptotes are the coordinate axes 
(c) circles with center (1, 0) 

(d) ellipses with center at (0,0), with major axis along 
x-axis, and with major axis twice as long as the minor 
axis 

(e) all circles passing through both of the points (1, 0) 
(— l, 0) 


3 Orthogonal trajectories. A curve is an ‘orthogonal trajec- 
tory’ of a family of curves if at each point (x, y) of the curve 
its tangent line is perpendicular to the tangent line to the 
curve of the family through that point. Thus if the family is 
described by a differential equation Adx + B dy = 0, a 
curve is an orthogonal trajectory of the family if and only if 
it is a solution of the differential equation A dy — B dx = 0. 
Sketch the orthogonal trajectories of the families dx = 0, 
xdy — ydx = 0,xdx + ydy = 0, and (a)-(e) above. Then 
give formulas for the orthogonal trajectories. [The ‘integra- 
tion’ of A dy — B dx = Q for the family (e) is not easy. Show 
that the curves (x — 1)? + y? = C[(x + 1)? + y?](C > 0) 
are orthogonal trajectories of (e) and show that these curves 
are circles. ] 


4 Sketch the curves y = const.|x|? for p > 0 and find their 
orthogonal trajectories. 


Chapter8 | Applications 278 


Harmonic Functions and 
Conformal Coordinates 


*/n §8.4 it will be proved that a 
harmonic function fs in fact analytic, 
that is, it can be written locally as 
the sum of a power series. This 
conclusion is even stronger than the 
conclusion that it is infinitely 
differentiable. 


8.3 


A function of two variables u(x, y) 1s said to be harmonic 
if it is continuous and if its average value over any disk 
D = {(x — x)? + (vy — 7)? < r°} where it is defined 
is equal to its value at the center of the disk: 


l 
1 xX Yy = —— 
(1) u, 7) = = J, u(x, y) dx dy. 
Similarly, a function of n variables u(x1, X2, ..., Xn) 1S 


said to be harmonic 1f it is continuous and 1f its average 
value over any n-dimensional ball D = {(xı — X1)? + 
(Xo — Xo)? +--+ + (xn — Xn)? < r°} where it is de- 
fined is equal to its value at the center of the ball: 


u(X1, X2, ...3 Xn) 


(1°) _ Ío Un X2, » » + , Xn) dxi dxa... dXn 
7 fp dx, dx... dXn 


For the sake of simplicity, only harmonic functions of 
two variables u(x, y) will be considered in this section. 


Theorem 


A harmonic function u(x, y) is necessarily (continuously) 
, , . . ooo ðu ðu 

differentiable and its first partial derivatives ax? dp are 
x oy 


themselves harmonic functions. Therefore a harmonic 
function is necessarily infinitely differentiable* and all its 
ðu au 

dx?” dx dy?’ 
functions. Moreover, the second partial derivatives of a 
harmonic function u(x, y) necessarily satisfy the relation 


partial derivatives, e.g. etc., are harmonic 


a'u | Ou _ 
Ox? 
known as Laplace’s equation. Conversely, any twice dif- 


ferentiable function u(x, y) which satisfies Laplace’s 
equation is harmonic. 


This theorem makes it easy to find examples of har- 
monic functions, which is not at all easy to do on the 
basis of the defining property (1). For example, the 
theorem says that a quadratic form ax? + 2bxy + cy’ 
is a harmonic function if and only if 2a + 2c = 0. In 
the same way a cubic form ax? + 3bx7y + 3cxy? + ey” 
is harmonic if and only if 6ax + 6by + 6cx + bey = 0, 


8.3 | Harmonic Functions and Conformal Coordinates 279 


that is, if and only if a + c = 0, b + e = 0. Thus the 
theorem implies immediately that a cubic polynomial in 
x and y is a harmonic function if and only if it is a poly- 
nomial of the form 


A + Bx + Cy + D(x? — y?) + El2xy) + F(x? — 3xy?) + GO? — 3yx?). 


5 


*A k-form is said to be ‘weakly 
closed’ if its integral over the 
boundary of any (k + 7)-dimen- 
sional domain is zero. The statement 
here is a special case of a general 
theorem which states that the 
pullback of any m-form under a 
differentiable map R" — R™ js weakly 
closed. If the function u is assumed 
to be differentiable, rather than 
merely continuous, this follows from 
f*[d(u dx dy)] = d[f*(u dx dy)]. 


This is not at all obvious from the definition (1). 


Proof 


Let (x, Y) be a point of the domain of u and let R be a 
number such that the disk D = {(x — x)? + 
(y — 7)? < R*} lies entirely inside the domain of u. 
The first step of the proof is to show that 

lim u(x + h, y) — u(x, F) 


h—=0 h 


1 . 
exists and is equal to —; I u(x, y) dy. To this end, 
TR“ JaD 

consider the map 


x=at+ec 
y=b 


of abc-space to the xy-plane, and consider the cylinder 
C= {(a — X)? + (b — 7)? < R?, O < c < h} in abc- 
space. The integral over ôC of the pullback of any 
2-form on the xy-plane is zero. This can be seen geo- 
metrically from the fact that the map carries dC to the 
xy-plane in such a way that every point in the image is 
covered twice with opposite orientations so that the 
whole integral cancels.* More specifically, the image of 
dC is divided in a natural way into five regions and ðC 
is then divided into ten pieces such that each region of 
the image is the image of two pieces of dC with opposite 
orientations. These pieces can easily be described in 
detail and the above statement can thereby be proved 
rigorously using the principle of independence of param- 
eter. (A slight technical difficulty arises from the fact 
that some of the pieces are not differentiable surfaces- 
with-boundary because their corners are ‘spikes’ where 
the boundary curves are tangent, but these details will be 
omitted.) 

Applying this observation to the 2-form u(x, y) dx dy 
gives 


| [u(a + c, b) dadb + ula + c, b) dc db] = 0. 
ac 


Chapter8 | Applications 280 


Orienting C by da db dc and orienting ðC accordingly, 
the integrals over the two disks at the ends of C are equal 
to rR7u(X + h, 7) and —rR7u(X, P) by (1). Thus 


rR? He FD) 7) — ; | u(a + c, b) db dc, 
sleeve 


where the ‘sleeve’ is the portion {(a — X)? + (b — f)? = 
R? 0<c<h} of ðC. Parameterizing the sleeve by 

= X + Rcosð, b = J + Rsin 9, c = t, this integral 
becomes 


1 fue + Rcos 0 + t, Y + Rsin 0)R cos 6 dé dt, 


where the integral is over the rectangle {0 < 0 < 2r, 
0 < t < h} oriented dé dt. (Checking the orientation of 
the rectangle is a good exercise in orientations.) Writing 
this as an iterated integral and passing to the limit as 
h — 0 gives 


h 2r 
im 4 f p uE + Reos 9 + t, 7 + R sin 8)Rcos 0 do |di 
o LJo 


h—=0 h 
20 
= | u(x + Rcos 0, P + R sin 6)R cos 6 dé. 
0 


But this is just fap udy when 0D is parameterized by 
x= X + Rceosé,y = F + R sin 6. Thus the limit which 


rR? 
where D is any disk with center (x, y) lying in the domain 
of u, and where R is the radius of D. But for fixed R 
the function Jao u dy is clearly a continuous function of 
(x, J) (pass to the limit under the integral sign using the 


ðu . . l 
defines — at (X, Y) exists and is equal to —; I u dy, 
Ox aD 


ou, . . 
theorem of §9.7); hence ax exists and is a continuous 
x 
function. Similarly ay exists and is continuous. Therefore 

y 


u is differentiable and by Stokes’ Theorem 
a ya L 1 f ow 
ax ©) = -R3 [. udy = TR Jp ay aX AY. 


_ OU, . . 
That is, — is harmonic. By symmetry between x and y it 


follows that ay also exists and is a harmonic function. 
y 


8.3 | Harmonic Functions and Conformal Coordinates 281 


This completes the proof of the first statement of the 

theorem. The remaining statements of the theorem are 

summarized by the statement: A twice differentiable func- 

tion u satisfies the mean value property (1) if and only if 
2 2 

ov + 2u = 0. To prove this let D be a disk with 

Ox oy 

center (x,y) and let u(r,6) denote u(X + rcos 6, 

y + r sin 6). Then (1) gives 


R 2r 
wR*u(X, y) = | u(r, 6)r dr d0 = | | | u(r, 6)r a| dr 
D o Lio 


where R is the radius of D. Differentiating with respect 
to R gives 


2T 
2r Ru(x, Y) = | u(R, OR dé 
0 


2 
_ l 
u(x, y) = 7> f u(R, 0) do. 


Thus the average value of u over any circle with center 
(x, F) is also equal to u(x, J). Differentiating this relation 
with respect to R 


2r 
1 ðu ðu . 
o-i) (34-cosa + S. sina) ao 


2r 
1 ðu dy _ au ax\ gg 
2rR Jo \Ox ð0 Oy O86 


1 ðu ðu 
= aR 3D (34 ay — ax) 


1 ðu — ðu 
= ER Jp (4 8) aay 


07u 07u 


Since this is true for any disk, it follows that 


ð u ðu 
ax? T aye 7 O 


But the steps in this argument are reversible: If 


ðu  d*u an , 
ax? + ay? = O then the derivative with respect to R of 
x y 


Chapter8 | Applications 


282 


1 2r 
5 | u(R, 0) dé is identically zero. Hence 
tr JO 


2T 
l 
= i u(R, 0) dé 


is constant. For small values of R it is an average of 
numbers all of which are nearly u(x, J); hence 


2r 
u(X, J) = = J u(R, 0) dé. 


Multiplying both sides by 27R dR and integrating from 
0 to r then gives (1). This completes the proof of the 
theorem. 


, ð ð 
If u is harmonic then the l-form w = “ dy — a dx 
Ox oy 


is closed, i.e., satisfies dw = 0. It follows that locally w 
is exact, that is, w = dv for some function v(x, y) which 
can be found from w by integration. For example, if 


ð ð 
u(x, y) = x? — y?, then dv = dy — dx = 2x dy + 
Ox oy 


2y dx gives v = 2xy + const. The relationship between 
these two functions u = x? — y*, v = 2xy can be seen 
very clearly by plotting the hyperbolas u = const., 
v = const. First of all, these hyperbolas are orthogonal; 
that is, the curves u = const. and v = const. intersect 
at right angles (see Exercise 3, §8.2). Moreover, if the 
constants are evenly spaced, say u = 1, 1.01, 1.02, 
1.03, ...andv = 1, 1.01, 1.02, 1.03, ..., then the net- 
work of curves very nearly forms squares. The same 
phenomenon will occur if one starts with any harmonic 


ð ð 
function u and defines v by dv = £ dy — dx. To 
Ox oy 


verify this, note that if 7 = u(x, Y), 0 = v(x, y) and if 6 
is a small number, then the curves u = 7+ ô, v = Ū 
intersect approximately at the point (¥ + Ax, Y + Ay) 
where Ax, Ay are defined by 


Ou _ Ou. 
ô = Au = ay ©) Ax + ay (x, F) Ay 


Ov ,_ _ ðV 
0 = Av = jx WAX + y (x, F) Ay 


8.3 | Harmonic Functions and Conformal Coordinates 


*These equations (2) are called the 
‘Cauchy-Riemann equations’. 


283 
from which 
du 
Ax = 6% ___ 54 
au , (au\? 4 
Ox oy 
ðu 
auy (ay S 
Ox oy 
where 
U gy = Cee pa Van- Pay 
A= ~(% J) = jy I) B = jy © D) = a æD) 
_ A B) 2 ə 
s=] 4 -ear 


Similarly the curves u = Hi, v = 5 + 6 intersect approx- 
imately at (X¥ + Ax, y+ Ay), where 


0 = A-Ax+ B- Ay 
ô= —B-Ax+A-:Ay 
B A 


~3=> Ay = 82. 


Ax = A A 


Thus the four points of intersection of the four curves 
u = ņū, u = ņ -+ ô, v = 6,0 = U0 + 6 lie approximately 
at the corners of the square 


A B 


(r-02y4 å) (3404248) 
Moreover, since dudv = (A dx + Bdy)(—Bdx + 
A dy) = (4? + B*)dxdy, the orientations dudv and 
dx dy agree and the area of the ‘square’ is approximately 
(A? + B?)8?. 

A pair of differentiable functions u(x, y), v(x, y) is said 
to define conformal coordinates near a point (X, Y) 1f* 


(2) au _ ðv, ðu _ ðv 
Ox oy oy Ox 
, , ðu\? du\? , 
and if du dv ¥ 0, i.e., (=) + (=) ~ 0. Geometri- 
Ox oy 


Chapter8 | Applications 


284 


cally this means that the curves u = const., v = const. 

are orthogonal and divide the xy-plane into squares 

(approximately). If u(x, y) is a harmonic function, then 

dv = — ou dx + ou dy can be integrated (locally) to 
oy Ox 

give a function v such that (u,v) are conformal coor- 


. , ð ð , 
dinates provided du = m dx + dy Æ 0 at the point 
x y 


under consideration. (If du = 0 at the point, then v 
still exists but (u,v) are not conformal coordinates 
because du dv = 0.) Conversely, if (u,v) are conformal 
coordinates, then the integral of the 1-form dv = 


ð ð , . 
— S dx + S dy over any circle is zero and the proof 
y x 


above shows that u is a harmonic function. If (u, v) are 
conformal coordinates, then so are (v, —u). Hence v 
must also be harmonic. Thus if (u, v) are conformal coor- 
dinates, then both u and v are harmonic functions and 
either of them suffices to determine the other up to an 
additive constant. 

By the Implicit Function Theorem the equations 
u = u(x, y), v = v(x, y) can be solved locally for x, y 
as functions of u, v and the rule for implicit differentiation 
gives 


du = Adx + B dy 
dv = —B dx + Ady 


dx = Gps du + as ps 
A 
dy = aa lu + ae 
Ox A Oy ox —B ðY 


ðu A2 + B2 Ov dv. A2?+B2 ðu 


Hence x(u, v), y(u, v) are conformal coordinates on the 
uv-plane. By the chain rule, if (r, s) are conformal coor- 
dinates on the uwv-plane and if (u,v) are conformal 
coordinates on the xy-plane, 


dr 
ds 


then 


Cdu+ D dv du 
— D du + C dv dv 


A dx + B dy 
—B dx + Ady 


dr = (CA — DB)dx + (CB + DA) dy 
ds = (— DA — CB) dx + (—DB + CA) dy 


8.3 | Harmonic Functions and Conformal Coordinates 285 


which shows that the composite functions (r, s) define 
conformal coordinates on the xy-plane. 

It follows in particular that if r(u, v) is any harmonic 
function of u and v, then the composite function 
r(u(x, y), v(x, y)) is a harmonic function of x and y 
whenever (u,v) are conformal coordinates. This makes 
it possible to find many harmonic functions very easily. 
For example, setting r= u? —v*, u= x? — y’, 
v = 2xy shows that (x? — y”)? — (2xy)? = xt — 
6x7y? + yt is harmonic. 

An extremely important set of conformal coordinates 
on the xy-plane can be derived from the observation that 
the radial lines ax + by = 0 (a,b const.) are orthog- 
onal to the circles r = yx? + y? = const. If a large 
number of evenly spaced radial lines are drawn, and if 
circles are then to be drawn in such a way as to form 
squares, the circles must be drawn more densely near the. 
origin. Specifically, the density must be inversely propor- 
tional to the circumference, which is proportional to r. 


. , . dr 
This leads to the conjecture that setting du = I will lead 
r 


to a set of conformal coordinates on the plane. 

Now du = dr/r gives u = logr, a function defined at 
all points of the plane other than (0, 0). This function is 
indeed harmonic as is seen by writing 


dr _l d(r?) _xdx+ ydy 


d=, 2 r x? + y? 
xdy — y dx, 


x2 + y? 


Since this l-form is closed (it is d(Arctan(y/x)) or 
d(Arccot (x/y))), it follows not only that u = logr is 
harmonic but also that (logr, @) are conformal coor- 
dinates, where @ denotes the ‘multiple-valued function’ 
Arctan(y/x). Near any point (x,y) Æ (0,0) one can 
define a function 0 + const., for example, Arctan(y/x) 
or Arccot(x/y), such that (log r, 6) are conformal coor- 
dinates; but no one function with this property can be 
defined for all (x, y) = (0, 0). 

Inverting the equations u = logr,v = 0 givesr = e”, 
x = rcos@é=e"“cosv, y=rsiné = e“sinv, which 
leads to the conclusion that the functions (e” cos v, 
e” sinv) are conformal coordinates on the uv-plane 
(because the inverse of conformal coordinates gives con- 
formal coordinates). This is immediately verified. Thus 


ðu ðu 
~ By OT 5 Y= 


Chapter 8 | Applications 


e“cosvu and e“sinv are both harmonic functions of 
u and v. 

Since the difference of two harmonic functions is 
obviously a harmonic function, the function 


(3) u= Z(logl(x — xo)” + Q — yo)*] — logi — xd)? + Y — yd") 


The curves u = const. 
are solid. The 
orthogonal curves 
are dotted. 


is a harmonic function defined for all (x, y) other than 
(Xo, Yo), (x1, y1). The curves u = const. are the curves 


(X= Xo) + OY = Yo) _ positive const. 


(x — x1)? + YY — yi)? 

It is easily seen that such a curve is a circle except when 
the positive constant is 1 (i.e., u = 0), in which case it 
is the perpendicular bisector of the line segment from 
(Xo, Yo) to (x1, y1). This harmonic function, which can 
be regarded as describing a ‘source’ of some kind at 
(Xo, Yo) and a ‘sink’ of equal magnitude at (x1, y1), has 
many physical applications. 

An important mathematical application of the har- 
monic function (3) is the following: The curves u = 
const. near (Xo, Vo) are circles around (Xo, yo) but 
(Xo, Yo) is not at the center of these circles. The curves 
orthogonal to u = const. can be shown (see Exercise 3, 
§8.2) to be circles which pass through (xo, yo) and 
(xı, y1). Near (Xo, Yo) these are like radial lines through 
(Xo, Yo), and it is easily imagined that there is a con- 
formal change of coordinates near (Xo, yo) in which 
these curves actually are radial lines, and hence in which 
the circles u = const. have their centers at (Xo, Yo). If 
U(x, y) is any harmonic function, then with respect to 
this imagined new coordinate system U(xo, yo) is equal 
to the average value of U on the circle u = const. This 
average value is found by drawing a large number, say 
N, of equally spaced radial lines (circles in xy-coor- 
dinates) from (Xo, Yo), evaluating U at the N points 
where they intersect the circle u = const., adding, and 
dividing by N. As N — œ the average so obtained would 
be expected to approach U(xo, yo). With respect to 
xy-coordinates these ‘radial lines’ are actually circles 
through (Xo, yo) and the N points of intersection with 
u = const. are clustered more densely on the part of the 
circle nearest (xo, yo). AS N —> œ one then obtains 
U(x, Yo) as a weighted average of the values of U on 
the eccentric circle around (xo, yo), weighting the values 
of the points nearer (x9, yo) more heavily. This leads to 


8.3 | Harmonic Functions and Conformal Coordinates 287 


U(xo, Yo) 


2r 
-Lj U(a + r cos 08, b + rsin 6) 
2r Jo 


the expectation that if U(x, y) is harmonic on a disk 
{(x — a)? + (y — bD? < r°}, then the value of U at any 
point (Xo, Yo) inside the disk can be written as a weighted 
average of the values of U on the circle {(x — a)? + 
(y — b)? = r°}. The explicit formula, called Poisson’s 
Integral Formula, 1s 


r= (to = a} — Oo — bY 
(xo — a — r cos 0)? + (vo — b — r sin 0)? 


It is derived in §8.4. Note that if x9 = a, yo = b, the 
formula reduces to the statement that U(a, b) is the 
(unweighted) average value of U on the circle 
{(x — a)’ + (y — b? = r’. 

More generally, if D is any compact differentiable two- 
dimensional manifold-with-boundary in the xy-plane, 
and if U(x, y) is harmonic throughout D, then the values 
of U on ðD determine the values of U throughout D. 
This is proved as follows: If U, is another harmonic 
function on D which agrees with U on 0D, then U — U; 
is harmonic throughout D and is identically zero on dD. 
If U — U; is not identically zero on D, then |U — U,| 
assumes a non-zero maximum at some point (xo, Yo) 
inside D. Let r be such that the circle {(x — xo)? + 
(y — yo)? = r*} lies inside D and touches ðD. Then, 
since U — U, is harmonic, its value at (Xo, Yo) is the 
average of its values on the circle {(x — xo)? + 
(y — yo)? = r°}. On the other hand, the absolute value 
of U — U, on the circle is at most |U(xo, yo) — 
U1(Xo, Yo)| (by assumption) and strictly less than this at 
some points (near ôD). Since averaging decreases the 
absolute value, this is a contradiction unless the assump- 
tion U Æ U, is false. Thus U = U, as was to be shown. 

Thus the values of a harmonic function on the bound- 
ary 0D of a compact domain D determine the values 
throughout D and theoretically one should be able to 
find a formula for the values inside D in terms of the 
values on the boundary. The Poisson Integral Formula 
accomplishes this when D is a disk, but for other do- 
mains, even for rectangles, an explicit formula is very 
difficult to give, although such a formula does ‘exist’ for 
all domains D. 

Another question of considerable interest is: Given a 
compact 2-dimensional domain D, and given a function 
on ðD, is there a harmonic function U on D whose 
values on ðD are the given function? On physical 


Chapter8 | Applications 


288 


Exercises 


grounds the answer is “yes” for the following reason: 
Imagine D to be a sheet of tin and imagine the edge ô D 
of the sheet to be maintained at a temperature equal to 
the value of the given function on oD. After a sufficient 
period of time one would expect to arrive at a state of 
thermal equilibrium, at which time the temperature of 
the tin sheet would be the desired harmonic function 
U(x, y) on D. The problem of proving that this statement 
is true mathematically, that is, the problem of proving 
that there exists a harmonic function with given boundary 
values is called the ‘Dirichlet Problem’. Its successful 
solution (for reasonable domains D there exists a har- 
monic function on D with arbitrarily specified boundary 
values) has played a major role in the history of 
mathematics. 


1 Discuss harmonic functions of one variable. What is the 
analogue of Laplace’s equation? What is the analogue of the 
Poisson Integral Formula? 


2 Prove that a twice differentiable function u(x, y, z) of 
three variables is harmonic [i.e. has the mean value property 
(1’)] if and only if it satisfies Laplace’s equation 


2 


ax2 T ay2 T az 
[The proof of the two variable case given in the text uses 
polar coordinates (r, 0) which do not generalize nicely to 
three dimensions. What generalizes nicely is the 1-form 


xdy — y dx 
x2 + y2 


dł = 
which becomes 


x dy dz + y dz dx + z dx dy 
(x2 + y2 + 72)3/2 
Show first that dx dy dz = r*wdr and that dw = 0. The 


identity 
2- 2 — 2 
u(X, Y, af r w dr = | ur w dr 
D D 


is shown to be equivalent to 


2 = 2 2 
u(X, F, J r w = | UF w 
ðD ôD 


(D a ball with center (x, ¥, Z)), by considering both sides as 
integrals with respect to r. Since r is constant on ðD, this is 


8.4 | Functions of a Complex Variable 


289 


u(X, ¥, a) w = | uw. 
oD aD 


The left side is constant and the right side is nearly equal to 
the left when the radius of D is small; hence the original 
identity is equivalent to fap uw = const. as a function of the 
radius of D. From 


| w= | u(x + ra, 9 + rb, zZ + rc)(a db dc + b dc da + c dadb) 
ðD a?4+b?+c?=1 


Functions of a Complex Variable 


this can be shown to be equivalent to 


| OM oy dz + oH dz dx + 4 ax dy) =0 
aD Oy Oz 
from which the desired conclusion follows.] 


3 Prove that a function u(x, y, z) which is harmonic on a 
reasonable domain D in R3 is determined by its values on 0D. 


4 Using the fact that dw = 0 (where w is as in Exercise 2), 
conclude that r—! is a harmonic function on R® (not defined 
at the origin). Prove in the same way that r?—” is a harmonic 
function on R” for n > 3. 


5 Define ‘average over a sphere’ in such a way that u(x, y, z) 
is harmonic if and only if its average over any sphere is equal 
to its value at the center of the sphere. 


6 The conformal coordinates (u,v) = (x2 — y?, 2xy) arise 
from squaring the matrix 


x —y\(x -y\ _ (x? -yz —2xy \. 
y x|) \y x 2xy x? — y? 


Show that the nth power of this matrix gives in the same way 
polynomials P,,(x, y), Qn(x, y) of degree n which are con- 
formal coordinates except at (0, 0). 


7 Show that if (u, v), (r, s) are conformal coordinates on the 
xy-plane, then (ur — vs, us + vr) are conformal coordinates 
provided d(ur — vs) Æ 0. Express this in terms of 2 Xx 2 
matrices. 


8.4 


A complex number is a 2 X 2 matrix of the form 


C) 


where x, y are real numbers. Such matrices are added to 


Chapter8 | Applications 


290 


each other, multiplied by each other, and multiplied by 
real numbers, according to the rules of matrix algebra. 
Letting 1, i denote the complex numbers 


o) o) 

0 1? Ud 0 

respectively, every complex number z can be written in 
just one way as z = x: 1 + y:i, where x and y are real 
numbers. All the usual rules of arithmetic apply to the 
addition and multiplication of complex numbers, 
namely: 

The sum of two complex numbers is a complex 
number. Addition is associative and commutative 
(zı + Z2) + Z3 = 21 + (Z2 + 23), 21 + Z2 = Z2 + 21. 
Given any two complex numbers Z1, Zo there is a unique 
complex number z such that zı + z = Ze (subtraction 
axiom). The solution z of zı + z = zı is the same for 
all z,, namely the complex number 


(0 o) 

0 0/7 

This complex number is denoted by 0. The product of 
two complex numbers is a complex number and the 
operation of multiplication of complex numbers is asso- 
ciative, distributive over addition, and commutative. The 
commutative law does not apply to the multiplication of 


arbitrary 2 X 2 matrices, but it does apply to complex 
numbers: 


(= z) (*: 2) 
yı Xı/ W2 X2 
_ (7 —VV2 —xXiv2— 71x2) 
YıX2 + Xıy2 —Viyo + XıX2 


— (2 z2) (= a) 
y2 X2/ \Vi Xi 
This computation can also be written in the form 
(xıl + yu D(%o°l + yor) = XyX2°1'1 + 
Xo lit yixg itl + yiyg i i= (xıXx2 — yYıy2)' 1 
+ (X12 + 1X2) i = (Ko 1 + yo D(%1°14+ y1° i). 
Given two complex numbers z1, Zg with zı ~ 0, there is 
a unique complex number z such that zz = Ze (division 
axiom). This is proved by setting zı = xı: l + yı'i 
and multiplying both sides of zız = Zə by xı: 1 — yy'i 


8.4 | Functions of a Complex Variable 


291 


to find (x? + y?)z = (z1 - 1 — yı : i)z2. By assumption, 
x? + y? ~ 0; hence if z;z = Zo, then z can only be the 
complex number (x? + y?)7(x,:1— yı’ azo. But 
this complex number z does satisfy zız = Zə, which 
proves that the division axiom holds. If z is any complex 
number, then 1-z = z,0-:z = 0. 

A complex number z is said to be ‘real’ if it has the 
form z = x: 1, where x is a real number. No substantial 
ambiguity results if the distinction between the real 
number x and the real complex number x - 1 is dropped. 
Then every complex number z can be written in only 
one way aS z = x + iy, where x, y are real numbers 
(that is, real complex numbers). 

The size or modulus of a complex number is defined to 
be the square root of its determinant 


e+ | = 4] 


For real numbers x the modulus is the ordinary absolute 
value Vx? = |x|. The modulus has the familiar proper- 
ties of absolute values: 


x =y 
y x 


= /x2 + y2. 


(i) |z| = Oif and only if z = 0. 
(ii) |Z4Z9| = IZ 4 ° Zol. 
(iit) |z1 + Zə| < |za| + |z2l. 


The first two of these statements are immediate from the 
definitions, and the third is easily verified by squaring 
and applying the Schwarz inequality x;xo + yıy2 < 
VX? + yt V x8 + 3. 

Thus the complex numbers form an arithmetic (a 
‘field’ in the terminology of algebra) whose arithmetic 
operations are exactly like the arithmetic operations for 
real numbers. There are two major differences between 
these ‘arithmetics’: The first is that the real numbers have 
an order relation, and the complex numbers do not. For 
example, 7 is neither ‘greater than’ nor ‘less than’ 1. The 
second difference is that many algebraic equations which 
do not have solutions in the arithmetic of real numbers 
do have complex solutions. For example, the equation 
x? = —1 has no real solution but has the complex 
solutions + i. 

The important similarity between the two arithmetics 
from the point of view of calculus is that the modulus |z| 
can be used to define /imits of complex numbers. In fact, 
the concepts of limit of a sequence, derivative, integral, 


Chapter8 | Applications 


292 


etc., can be defined for complex numbers using exactly 
the same words as for real numbers, the only change 
being that |z| denotes the modulus of a complex number 
rather than the absolute value of a real number. For 
example: A sequence of complex numbers Z1, Z2, Z3,... 
is said to converge to the limit Z» if for every (real) 
e > 0 there is an N such that |z, — z.| < € whenever 
n > N. Thus the sequence 


Lhi+tzai¢z4t¢27,14¢2z24+ 274 25,... 


converges to the limit 1/(1 — z) whenever |z| < 1 
because 


l n 
Pog zt Pt $2") 
|1 1-277? 
l—2z l—z 
_ 2 
=j’ 


The Cauchy Convergence Criterion states that a sequence 
Z1, Z2, Z3, . . . converges (to some limit Ze) if and only if 
for every € > 0, there is an N such that |z, — Zm| < € 
whenever n, m > N. Thus the sequence 


2 2 3 2 3 4 
Z zz a ar 
2 ii i a Ai, ty zty tzty 
converges for |z| < 1 because 


gti m+2 n—1 n 


Z Z Z 

mi m42 taoin 
rE are al" 
Smti m2" T 


< [zP + jT + e + z” 
PEA + [2] + z 
|rt! 
1 — |z| 


lA 


— Q. 


As in the real case, this means that z,, is determined to 
within an arbitrarily small margin for error e by taking 
n large, that is, increasing n does not significantly change 
Zn once n is large. The truth of the Cauchy Criterion for 
sequences of complex numbers is easily deduced from 


8.4 | Functions of a Complex Variable 


*Complex numbers are imagined as 
points in a plane in the same way 
that real numbers are imagined as 
points on a line. This is discussed 

below. 


To be consistent with the notation 
of the rest of the book. the notation 
Z should be used here to denote ‘a 
particular complex number’. 
However, the notation Z has another 
meaning (complex conjugate—see 
below), so the notation Zo is used 
instead. 


293 


the corresponding statement for sequences of real num- 
bers (Exercise 2). 

A function f assigning complex numbers f(z) to com- 
plex numbers z is said to be continuous at a point* Zo 
if lim f(z) = f(zo), that is, if for every (real) e < 0, there 


is a ô> 0 such that |f(z) — f(Zo)| < € whenever 
lz—z l < ê. A function f(z) is said to be continuous 
if it is continuous at every point where it is defined. A 
function f(z) is said to be differentiable 


hì) — 

at a point Zo if lim [Eo 0) T Le) exists; that is, if 
h—0 

there is a complex number f’(zo) with the property that 

for every € > 0, there is a 6 > O such that f(Zo + A) is 


defined and 


I (Zo + 2 — f(Zo) — f(z) < € 


whenever h is a non-zero complex number satisfying 
|h| < 5. A function f(z) is said to be (continuously) 
differentiable if it is differentiable at every point and if its 
derivative f’(z) is a continuous function. 

As in the real case, the sum f(z) + g(z) of two 
differentiable functions f(z), g(z) is differentiable with 
derivative f’(z) + g’(z), and their product /f(z)g(z) is 
differentiable with derivative (’(z)g(z) + f(z)g’(z), 
because 


f(z + h) + oz + h) -U + 8) 
h 


_ f(z+h)— f(z) , (z+ h) — gz) 
= h + h 


> f'@) + 8’) 


and 


fle + Age + h) — fg) 
h 
= LE LM =JO oe + iy + sy EEN =O 


> faz) + Fg 2). 


(The precise proofs are exactly as in the real case.) Since 
a constant function f(z) = a is differentiable with deriva- 
tive zero and the identity function f(z) = z is differen- 
tiable with derivative identically 1, it follows that a 
polynomial function f(z) = az” + anız"! +++ + 
aız + ao is differentiable with derivative na,z"~' + 
(n — Danz"? + +++ + a. 


Chapter8 | Applications 


294 


The function f(z) = 1/z, defined for z Æ 0, is differ- 
entiable with derivative 


- lt} 1 yy, z-@+A) 

lim i Lt s|- im h(z + h)z 
-lim Z! --L. 
poo (z+ hz 2? 


The composition f[g(z)] of differentiable functions is 
differentiable with derivative f’[g(z)]- g'(z). (As in the 
real case, the proof of this fact is a bit tricky. See Exer- 
cise 10.) Thus if f(z) is differentiable and f(z) = 0, then 
1/f(z) is differentiable with derivative —f’(z)/[f(z)]’. 

Geometrically, it is natural to represent complex num- 
bers as points in a plane, letting the complex number 
z = x + iy correspond to the point of the xy-plane 
(now called the z-plane) with coordinates (x, y). Thus 0 
corresponds to the origin, 1 to the point (1, 0), 7 to the 
point (0, 1). Real numbers correspond to points on the 
x-axis. Multiplication by i carries (1, 0) to (0, 1), (0, 1) to 
(—1, 0), and (x, y) to (—y, x); hence geometrically the 
operation of multiplication by 7 is a rotation of 90° in 
the positive sense. Multiplication by i? is a rotation of 
180° carrying (x, y) to (—x, —y). More generally, multi- 
plication by the complex number z = a + ib carries 
x + ly to (ax — by) + i(ay + bx); hence multiplica- 
tion by z = a + ib corresponds geometrically to the 
linear transformation of the xy-plane whose matrix of 


coefficients is 
a —b\- 
b a 


It is easy to give a geometrical description of this map 
(Exercise 4). The operation of addition of complex 
numbers is performed by adding corresponding com- 
ponents, as (xı + iyı) + (%2 + We) = (xı + x2) + 
i(yı + y2), which can be expressed geometrically by 
saying that the points 0, Z1, Z2, Z; + Ze form the vertices 
of a parallelogram. 

A complex 1-form is an expression of the form f(z) dz 
where f(z) is a complex-valued function of a complex 
variable. The integral fe f(z) dz of a 1-form f(z) dz over 
a curve C in the z-plane is the limit of the approximating 
sums 


Ds f(z) Az; 


8.4 | Functions of a Complex Variable 


295 


formed by subdividing C into a large number of small 
segments, choosing a point z; on the jth segment and 
letting Az; = 2; — Z;_, be the difference between the 
end points ĉ;, Z;_, of the jth segment. If f(z) dz is a 
continuous 1-form, that is, if f(z). is a continuous func- 
tion, and if C is a compact, oriented, differentiable curve 
in the z-plane, then it can be shown that the approxima- 
ting sums have a limiting value f cf(z)dz. However, rather 
than prove that lim » I (z;) Az; exists, one can instead 


J 
define the integral fe f(z) dz in terms of real integrals 


(1) | war—vay+i] way tod 


where u(x, y), v(x, y) are the real-valued functions defined 
by f(x + iy) = u(x, y) + iv(x, y). If f(z) is a continuous 
function, then the real 1-forms u dx — v dy, u dy + v dx 
are continuous and the integrals above are defined in 
Chapter 6 for any compact, oriented, differentiable 
curve C (with or without boundary points) in the z-plane. 
It is true that the number defined by (1) is indeed the 
limit of the approximating sums >) f Az = $ (u + iv) 
"(Ax + iAy) = Di(uAx — vAy) + i (uAy + v Ax) 
but the proof of this fact will be omitted. 

The Fundamental Theorem of Calculus 


(2) fE) — fzo) = | f(z) dz 


(where the integral is taken over a curve from Zo to 2, 
along which f is continuously differentiable) is simply 
Stokes’ Theorem 


(2) | t- [are [row 


for this case. The 0-dimensiona! ‘integral’ on the left is 
a sum over the boundary points of the curve C, each 
being ‘oriented’ plus or minus depending on whether the 
oriented curve C leads into or out of the boundary point. 
To prove the formula (2) on the basis of the definition 
(1) of the right-hand side, one must note that the defini- 
tion of differentiability implies that f’(z) can be written 
in either the form 


lim LE Tt} fC) _ du, , dv 


)=-J ; 
h0 h Ox Ox 


Chapter8 | Applications 296 


*/n the remainder of this section a 
‘domain’ D will be a compact, 
differentiable two-dimensional 
mantfold-with-boundary in the 
z-plane oriented by dx dy, and aD 
will denote the boundary curve of D 
oriented by the usual convention. 


or the form 
tim Et- SOL L jax n i 
h—=0 ih i oy oy 


where the limit is through real values of h. Thus the 
differentiability of f implies not only that the functions 
ðu ðu ðv ðv 
dx’ dy’ ax’ ay 
satisfy 


are continuous, but also that they 


which implies 


ðu Ov dv ðu, 
Ox oy ox oy 
Therefore 


| rea 
= | (41%) ae tia 
= | (Mae 2 ay) + if (May + ar) 
- | a+ if d- of 


as expected. 

Now if f(z) is differentiable, then formally d[ f (z) dz] = 
f'(z) dz dz = 0. Hence one would expect that if f(z) is a 
differentiable function, then the 1-form f(z) dz is closed. 
This is immediately verified: 


di(u dx — v dy) + i(udy + v dx)] 
ðu Ov .[/ðu ov _ 


X 


It follows that if f is differentiable on a two-dimensional 
domain* D, then 


| f(z) dz = 0. 
aD 


This is known as Cauchy’s Theorem. 

Of course if f(x) is a differentiable function of one real 
variable, then d[f(x)dx]=0. In this case, however, 
there are no two-dimensional domains, hence no bound- 
ary curves ðD over which to integrate, and hence no 
Cauchy’s Theorem. This difference—between the pres- 


8.4 | Functions of a Complex Variable 


297 


ence and the absence of Cauchy’s Theorem—is the 
source of the vast difference between the calculus of 
complex functions and the calculus of real functions, 
which will be seen in the following paragraphs. 

The complex 1-form dz/z plays a central role in the 
theory of functions of a complex variable. It is closely 
related to the local conformal coordinates (log r, 0) of 
§8.3 as is seen from the equations 


dz _ (x — iy)(dx + idy) 
Z (x — iy)(x + iy) 
_ xdx + ydy | .xdy — ydx 
x2 + y2 i x2 + y? 
d(log r) + i dô. 
In particular, if f(z) is any function of z, then the integral 
of f(z)dz/z around the boundary of the disk |z| < r 


can be found by parameterizing z = r cos 0 + ir sin 6, 
0<6< 2r: 


Ok 


|\z|==r 


2r 
_ os (—r sin 0 + ircos 6) dé 
-J f(r cos @ + ir sin 6) rcos 0 + irsin 0 


2r 
= f(rcos 6 + irsin 0)i do 
0 


= 2ri X (average value of fon the circle |z| = r). 


Consider now the integral of f(z) dz/z around the 
boundary ðD of an arbitrary domain D on which f(z) 
is differentiable. If D does not contain the point 0, then 
1/z is defined and differentiable throughout D; hence so 
is f(z): 1/z and 

f@) dz _ 0 
aD Z 
by Cauchy’s Theorem. If D contains the point zero in 
its interior, then D can be decomposed into two domains 


Dı, Də where D, is the disk of radius e with center at 0 
and Də is the remainder of D; hence 


fad | fOr, | fed 
ðD Z aD, Z aD, Z 
JOLIN 
|z|=e Z 


2ri - (average value of f(z) on |z| = e€). 


Chapter8 | 


| -(2=2 


Applications 298 


Thus the integral over ðD is the same for all domains D 
which contain 0, and this number is equal to 27i times 
the average value of f on any circle with center at 0. By 
the continuity of f this average value must be arbitrarily 
near f(0) for e sufficiently small; hence it must be exactly 
f(O) for all e. The same argument applies to the evalua- 
tion of the integral of f(z) dz/(z — a) over ðD to give the 
Cauchy Integral Formula 


f(a) if ais inside D 
0 if ais outside D 
not defined if ais on 0D 


i 
2ri 


fe) dz _ 


z—a 


(3) 


aD 


for any function f(z) which is differentiable throughout 
the domain D. 

The Cauchy Integral Formula shows in particular that 
the value f(a) of a differentiable function f at any point a 
inside D is determined by its values f(z) on ôD. Using 
the algebraic identity 


P—-r*ha(—-ndtrterete ter” 


with 
pa =F ] = 2220 
zZ—a z—a 
gives 
n+1 z— Zo\ Í. Zo — a 2a) 
a = (=) [14 2S44 4 (HES 
l Zl 274 ,..., oT)" (Zo — a)" +? 
7-2 z-a G-a Taat (z — Zo)\(z — a)r” +1 
which combines with the Cauchy Integral Formula to 
give 
_—1 | fd 
fo) 7 2ri aD Z — Zo 
_ o— {| Sed 
= f(a) + 2ri Jap (2 — a)? 
4 (Zo — a)” f(z) dz 4 (Zo — aj" +! | f(z) dz 
2ri ap (Z — a)” +1 2ri ap (Z — Zo)(z — a)”+1 


= Co + c1i(Zo — a) +++ + cn(Zo — a)" + Rn 


8.4 | Functions of a Complex Variable 


F(Z) = co 


299 


where D is any domain containing Zo, a and where 


] f(z) dz 
G= Szi ap G — aT (j =0,1,2,...,n) 
R = =a | faz 
a 2ri ap (Z — Zo)(z — a)"+! 


It will be shown that for all complex numbers Zo suffi- 
ciently near a, the remainder R, approaches 0 as n — oo. 
Hence 


20 


fzo) = È calzo — a)” 


n=0 


which shows that every differentiable function is in fact 
analytic. (A function is said to be analytic if it can be 
written locally as the sum of a power series.) More 
specifically: 


Theorem 


Let f(z) be a complex function of a complex variable 
which is differentiable at all points inside the disk 
{|z — a| < r} of radius r with center a. Then the value 
f(z) of f at any point z of the disk is equal to the sum of 
the power series 


© N 


f(z) = $ «(z — a)" = lim > ¢,(z — ay 


n=0 —>O n= 


with coefficients 


c = l f(z) dz 
"  2mi Jap (z — art} 

where D is any domain containing a on which f(z) is 
differentiable. (This number c, is independent of the 
choice of D, by the argument above.) Conversely, if 


Co, C1, C2, C3, ... are complex numbers such that the 
power series co + ¢,(z — a) + ¢o(z — a)? +: + 
Cn(z — a)” + : -> is convergent for some Zo = a, then 


it is convergent for all values of z inside the disk 
{lz — a| < |Z) — al} and defines a differentiable 
function 


+ ¢4(z — a) + coz — a)? + t + cnz- a)? + °°: 


Chapter8 | Applications 300 


whose derivative f’(z) is equal to the sum of the power 
series 


f'(Z) = cı + 2ca(z — a) + 3c3(z — a)? +e NC,(z — a} Tt + + 
for all z in the disk {|z — a] < |Zo — al}. 


Corollary 


If f(z) is (continuously) differentiable on a disk 
{|z — a| < r}, then it is in fact infinitely differentiable 
and the Taylor series converges to the function 


fe) = fa + fae - tE e-a EO e - a + 


for all z in {|z — al < r}. 


Proof of Corollary 


By the first part o! of the theorem f(z) = co + cı(z — a) + 
Cə(z — a)? +--+; hence by the second part of the 
theorem f’(z) = cı + 2¢2(z — a) + 3c3(z — a)? + 

Hence, again by the second part of the theorem, f Oi is 
differentiable and f’’(z) = 2cg + 6c3(z —a)+°°°+ 
n(n — 1)c,(z — a) T? +--+ . Repeating this argument 
it follows that all derivatives exist and that f(z) = 


mM Cm _ 
n'cn tee + > (2 o a)™ "+ eeo . In particu- 
(m — n)! 
lar f(a) = nie, and f(z) = co + e(2 — a) + ++ + 
oz — ay o = fa + Aaa 
f™(a) 
n! 


(z — a)” +--+: as was to be shown. 


As a further corollary, the two formulas for c, give 


n f(z) dz 
f(a) = ri h (z — anti 


where D is any domain containing a on which f is 
differentiable. 


Proof of Theorem 


Let f(z) be differentiable on the disk {|z — a| < r}, and 
let zo be a given point of this disk. The first statement of 
the theorem is that the remainder term 


R, = I f(z) (2 — ay dz 


2ri Jap Z — Zo \z— a 


goes to zero as n — oo. Here D is any domain containing 


8.4 | Functions of a Complex Variable 


301 


a and zo on which f(z) is differentiable. Let p < r bea 
number such that the disk {|z — a| < p} contains Zo in 
its interior, parameterize the boundary of this disk 
z= a4 pcos + ipsiné, 0<6< 2r, and hence 
write R, as the average value of 


29 — a(Z — a\" 

fC) Z — (2 — 2) 

on the circle |z — a| = p. The first two terms of this 
product are independent of n; the third has modulus 


n-d (la=al o 


jz — ap 
Hence if F,,(@) denotes the above function on |z — a| = p 
and if e > 0 is given, there is an N such that |F,,(@)| < € 
for all 0,0 < @ < 2r, and for all n > N. It follows that 
20 

1 | F,,(8) dé 
r Jo 
of sums and use the triangle inequality |z; + Zə| < 
zıl + |Z2|); hence |R,| — 0 as n— œ as was to be 
shown. 

To prove the second half of the theorem, let co, cy, 


< e (consider the integral as a limit 


Co,...and Zo Æ a be given such that > Cn(Zo — a)” is 
n=0 
convergent, and let r = |zọ — a| > 0. It is to be shown 
first of all that > Cn(z — a)” is convergent whenever 
n=0 


|z — a| < r. This can be done by noting that the num- 
bers |ca(Zo — a| = |ca|r” must be bounded, say 
lcn |r” < K (otherwise >>c,(z9 — a)” could not converge 
—see Exercise 3). Hence for |z — a| = ry < r and for 
N > M, 


N M 
> Calz — ay” — Dy trlz — a" 


n=0 n=0 
= Jewyi(z — a*t +--+ + ewe "| 
< Jempi T +++ + lende 


< (a) Lived (2y 
c EG E) 


Chapter8 | Applications 


302 


As M — œ this goes to zero, which shows that the 
N 
sequence of partial sums fy(z) = >. c,(z — a)” satisfies 
n=0 
the Cauchy Criterion for |z — a| < r. Hence lim fy(z) 
N— o 


exists for all z in the disk {|z — a| < r}. Let f(z) denote 
this limit. It is to be shown that the function f(z) so 
defined is differentiable and that its derivative is equal to 
the sum of the series 


cı + 2co(z — a) + 3c3(z — a)? + 


for all z in the disk {|z — a| < r}. 
It will be shown first that this series $, nc,(z — a)” ~! 
n=1 


is convergent in the disk {|z — a| < r}. To show this it 
suffices to show that the numbers n|c,|r77~_' are bounded 
for rı <r. Then the argument above shows that 
N 
lim >> ne,(z — a"! exists for |z — al < rı and, 
Now n=l 
since r; is arbitrary, it follows that the limit exists for all 
z in the disk {|z — a| < r}. Now since alc,|r{—' < 
nKr”ri | = n(ri/r) t-r! K, it suffices to show 
that if0 < p < 1, then the numbers np” ' are bounded. 
This follows from 


np"—} = pr ba p"! + see p”! 
<Ltpt+p? +p? +--+ p™ +p" +-°: 
+ptp tpi to +p™*+ ph tor: 
+p? + pretest p™ t+ a 
+ pi tes + pt * t+ a 


4p la pr pee: 
p+ 
-rh trh tt t125 


<l tetet) = A = py? 


o0 


It remains to show that g(z) = 2 nc,(z — a)”—? is 


the derivative of f(z) for z in the disk “iz — a| < r}. The 
idea is that 


(4) ine tM — Inf) ~ gn(z) 


8.4 | Functions of a Complex Variable 303 


N 
where gy(z) = >, nc,(z — a)"—1. Hence passing to the 
n=l 


limit as N — oo 


(5) fet M- SO ~ g(2). 


One must estimate the error in the approximation (4) 
and use the estimate to conclude that the error in the 
approximation (5) goes to zero as |h| — 0. The error in 
(4) is estimated, as usual, by writing it as an integral: 


Ine) a Sh i = AC gn(z) 


| zth j z+hk 
-1 gow- t) gn(z) dw 


I zth 
=F | [gv(w) — gn(z)] dw 


where the integral is over a path from z to z + h, say 
over the line segment joining them. Parameterizing this 
segment w= z + th, O<t< 1, so that dw = hdt, 
gives 


fue +) = fH) ge 


Ii 


1 
J [gn(z + th) — gn(z)] dt 
N 1 
= 2 ne, f [(z + th — a! — (z — a) Tt] dt. 
Now 
1 
J [(z + th — aT! — (z — at] dt 


1 
7 J [(z + th — a) — (z — aJi(z + th — aT? + (z + th — a)" — a) 
+o (ea) dt 


1 
= nf {[(z + th — a)? + -+ (z — a)” ] dt. 


The integrand is a sum of n — 1 terms, each of which 


Chapter 8 | Applications 


304 


has modulus at most (|z — a| + |h\)"~?. Setting 
rı = |z — a| + |h] this gives 


fE+D= KO w 


N 


< [hl 2 n(n — Dealt”. 


n=1 


o0 
The convergence of the series >) n(n — 1)\c,|r7~? for 
n=1 00 


rı < r follows from the convergence of >> nlc,|r7—? in 
n=l 


the same way that the convergence of the latter series 


followed from the convergence of > |c,|r%. Letting 
n=0 


œ 


K'= f n(n — l)en}? for some r, satisfying 
n=l 


|z — a| < rı < r, it follows that 


for all non-zero h such that |z — a| + |A| < rı, and the 
theorem follows. 

The theorem just proved is the central fact in the theory 
of functions of a complex variable. It has many applica- 
tions to the theory of functions of real variables, of 
which the following are a few of the most important 
examples. 


The Binomial Series 


Leta = + P be any rational number (p, q positive inte- 
q 


gers), and let x* be the function Y x+? defined for x > 0. 
Then the power series representation 


(I + x)" 
=I+axt+ 


where 


ala 


PeH (H 


(H = eD eTEN 


n n! 


is valid for |x| < 1. This is proved by showing that 
(1 + x)“ can be extended to a differentiable complex 
function of a complex variable as follows: Note first 


that the power series >) z”/n! converges for all real 


n=0 


8.4 | Functions of a Complex Variable 


305 


values of z. Hence the function defined by 


is defined and differentiable for all complex z and its 
derivative is e?. The function e%0t* is defined for all z 
and is its own derivative. Therefore its Taylor series for 
= Qis 
<0 
70t? — e t ezt 
= ee’), 
Hence eot? = eoe for all Zo, zı. In particular e? is 
never zero and 1/e? is e *. Since the series 1 — (4) + 
(4) — (4) + -> converges (by the alternating series 
test), it follows that the function defined by 


3 4 


z z Z 
logd +2)=z- 5 t7z 77t 


is defined and differentiable for |z| < 1, and that its 
derivative is 


l= z+ z7 z +z e = —_. 
The composed function 


elog( 1+2) 


is therefore defined and differentiable for |z| < 1. Its 
derivative is 


elg +2) , I 


l +z 


and its second derivative is 


e2801 +2) , 1 log(1+2z) , -l1 = 0. 
(1 + z)? (1 + z) 
Hence since e!°81+® — e? = 1, the Taylor series of the 


composed function for a = 0 is 
elog(1 +2) = | + z. 


The function e#!°2"'** for any (real or complex) number 
a is defined and differentiable for |z| < 1, and its deriva- 
tive is 
log(1+2z) 
a log(1+z) l _ e” + _ (a—1)log(1+z) 
“a: = a ———— = ae . 
I +z elog(1+2) 


e 


Chapter8 | Applications 


306 


Its second derivative is therefore 


ala — 1)e7 2 log. 1+2) 


etc., which gives the Taylor series 


o” log(1+2) 


= I+ oz t EDP (A 


valid for |z| < 1. On the other hand, if a = + p/q and 
if x is a real number with |x| < 1, then e*#"* js a 


, , a , 
real number (the series has real coefficients ( a) which 


is positive (because it is the square of the non-zero real 
number e?#!°8"1 +). and satisfies 


(ecloe(1+2))a — eielog(1 +z) — etPrlogl1 +2) 
— (eloe(1+2)) rp 


= (1 + x)*?. 


Hence e*e +2) — VW] + x)tp = (1 + x) as was to 
be shown. 


The Fundamental Theorem of Algebra 


Every polynomial f(x) = x” + adn_yx") +: + 
a,x + do with real coefficients a; can be written 
as a product of factors of the form (x — A) and 
(x? + 2Bx + C) where A, B, C are real numbers 
and C > B? (so that x? + 2Bx + C = (x + B) + 
(C — B?) is positive for all x). Every polynomial f(z) = 
z” + a,_yz" 1 +++ + ayz + ao with complex coeffi- 
cients can be written as a product of factors of the form 
(z — A), where A is a complex number. To prove these 


statements, let f(z) = >> a;zî be considered as a func- 
jJ=0 
tion of a complex variable. If zg is any complex number, 
then the polynomial f(z) can be written in the form 
f(z) = (z — Zo)g(z) + r, where g(z) is a polynomial of 
degree n — 1, and where r is a complex number. (This is 
simple algebraic division of polynomials.) If zo can be 
chosen so that f(z.) = 0, then r= 0 and f(z) = 
(z — Zo)g(z). If Zo is real or if the given polynomial has 
complex coefficients, then the factor z — Zo is of the 
desired type and the factorization of the given poly- 
nomial is reduced to the factorization of a polynomial 
of lower degree. If zo = Xo + iyo with yo ¥ 0 and if 


8.4 | Functions of a Complex Variable 


307 


the given polynomial has real coefficients, set B = — xo 
and C= x? + y2. Use division to write f(x) = 
(x? + 2Bx + C)g(x) + rıx + ro, where g(x) is a poly- 
nomial with real coefficients whose degree is 2 less than 
that of f, and where rı, ro are real numbers. Since 
f(Zo) = 0 and zê + 2Bzo + C = 0, the complex sub- 
stitution x = Zo gives first rı = 0 and then ro = Q. 
Hence f(x) = (x? + 2Bx + C)g(x) and the factoriza- 
tion of fis reduced to the factorization of the polynomial 
g of lower degree. Thus the factorization of any given f 
can always be reduced to the factorization of a poly- 
nomial of lower degree provided one can find a complex 
number Zo such that f(z 9) = 0. The theorem will there- 
fore be proved if it is shown that every polynomial 
f(z) = z” + anız"! + +- + ao with complex coeffi- 
cients and with degree n > Q has at least one root Zo, 
that is, that there is at least one complex number Zo 
such that f(z ) = 0. This is proved as follows: 

If f(z) is never zero, then 1/f(z) is differentiable for 
all z; hence by the theorem 


Fm cot ee tea to, 


where 


"  2mi Jiao f(2)2™*} 
for arbitrarily large p. Thus c,, is m! times the average 
value on the circle |z| = p of the function 


l l l 


fz zim NN 
an + aniz) toe tal 


Z 


For large p the modulus of the second factor is at most 
(\a,| — €)~* and the modulus of the first factor is con- 
stantly p7% +t™ , Thus |em| < m! (lan) — 6) 1p" t™ for 
all sufficiently large p, and hence c, = O unless 
m = n = Q. Thus if f(z) is never zero, f(z) = c3 * and 
the degree of f must be zero. 


The Implicit Function Theorem 
for Analytic Functions 


If f(x) is a differentiable function of a real variable, if 
f'(x) # 0 so that the Implicit Function Theorem can be 
applied to solve y = f(x) for x = g(y), and if the func- 


Chapter | Applications 


308 


tion f(x) is analytic, then the inverse function g(y) is 
also analytic. This is proved by using the power series 


f(x) = co tex — F) + eo(x — XP? 4+°°° 


to extend f to a differentiable function of a complex 
variable f(z) defined for z near X. The Implicit Function 
Theorem for complex functions w = f(z) is proved by 
successive approximations exactly as for real functions 
and yields a differentiable complex function g(w) = z. 
But since g is differentiable in the complex sense, it is in 
fact analytic; hence its restriction to real values g(y) = x 
is analytic, as was to be shown. The generalization to 
functions of several variables can be proved by showing 
in this way that at each stage of a step-by-step elimination 
the new functions are analytic if the original ones were. 


Harmonic Functions are Analytic 


If u(x, y) 1s a harmonic function of two variables defined 
near Xo, Yo, then the Taylor series 


ð 
u(Xo, Yo) + s (xo, Vox — xo) 


ð 
+ F (xo, Yo) — Yo) 


1 ð°u 
+ 3 5x2 (xo, Yox — xo)? +> 
1 gitk 
+ jik! TI (xo, Yox — xo) (y — Yo)" +t: 


converges to the function u(x, y) for all (x,y) near 
(Xo, Yo). This is proved by defining v(x, y) by dv = 


ð ð . 
— = dx + dy (which determines v locally up to an 
y x 


additive constant) and noting that f(x + iy) = u(x, y) + 
iv(x, y) is then a differentiable function of a complex 
variable. Hence f is analytic by the theorem and the power 
series for f gives power series for u and v. 


Poisson Integral Formula 


Let U(x,y) be a harmonic function on the disk 
{x2 + y? < 1} and let (xo, yo) be a point inside this 
disk. The discussion of §8.3 suggests a change of coor- 
dinates of the form 

Z — Zo 

Z — Zi 


w = 


8.4 | Functions of a Complex Variable 


309 


where Zo = xo + iyo, and where zı is a point outside 
the disk chosen in such a way that the circle |z| = lisa 
circle of the form |w| = const. It can be shown (Exer- 
cise 13) that zı = xı + iyı has this property if and 
only if 


XoX1 + Voi = 1 
Xo¥1 — YorX1 0 


which can be expressed simply as Zọzı = 1, where Zo 
denotes the ‘complex conjugate’ xo — iyo of Zo. Ex- 
pressing z as a function of w and hence expressing (x, y) 
as functions of (u,v), where w = u + iy, the functions 
(x, y) are conformal coordinates on the uv-plane. Hence 
the harmonic function U(x, y) becomes a harmonic 
function U(u, v) whose value at u = 0, v = 0 (x = Xo, 
y = Yo) 1s 


l dw 
2ri Ja Ulu, v) w 


To express this as an integral over |z| = 1 it suffices to 
express the 1-form dw/w in terms of z. Now 


(z — z1} 


w (za) (e-a Gwe) | dz az 


Z — Zo 


Z — Z Z2-2, 


Z — Zo ZZoŽo — Zo 
= dz | -zn eae] 
ZZ — ZoZ — ZZ + Zoo  ZZZoŽo — ZZo — ZZo + | 


U(xo, Yo) = 


where Z = z — iy and where the identity Z)z,; = 1 is 
used. On the circle |z| = 1 one has zZ = |z|? = 1. 
Hence the denominators are equal, the Zo’s in the 
numerator cancel, and 


woal iZ) = 2 ( 1 — ZoZo ) 
wo (Z — Zo)(Z — Zo)/ =z \(Z — Z0)Z — Zo) 


on zZ = 1. Thus 


l dw 
2ri I... Ulu, v) w 


l | 1 — x2 — ye dz 
— U(x, y) 3 T 
2ri Jizi=1 y) (x — xo)? + Q — yo)? z 

2T 

1 1 — xô — yo 


U(cos 6, sin 8) do 


2r Jo (cos 6 — Xo)? + (sin 6 — yo)? 


Chapter8 | Applications 


310 


Exercises 


which is the Poisson Integral Formula for r = 1, 
a = b = 0. The general formula is obtained by a simple 
change of coordinates. (Note that the proof of the Pois- 
son Integral Formula uses only the arithmetic of complex 
numbers to simplify the expression of dw/w in terms of z, 
and does not depend on the theorem that differentiable 
complex functions are analytic.) 


1 Use the Schwarz inequality (for real numbers) to prove 
the triangle inequality |z1 + ze| < |zı| + |z2| for complex 
numbers. 


2 Prove the Cauchy Convergence Criterion for sequences 

Z1, Z2, Z3, ... Of complex numbers using the Cauchy Con- 

vergence Criterion for sequences of real numbers. [Use 
max(|x|, |y) < Vx2 + y2 < 2 max((xl, |»); 


to prove that a complex number z = x + iy is ‘small’ if and 
only if x, y are both ‘small’.] 


3 Show that if an infinite series lim (zı + z2 + z3 + 
N-o 


... + zy) converges, then the terms are bounded, that is, 
\Zn| < K (all n) for some K. [Use the Cauchy Criterion plus 
the fact that a finite set is bounded.] 


4 In what way does the operation of multiplication by 2 
transform the z-plane? Multiplication by —1? By 0? Show 
that a linear transformation 


a —b 

b a 
in which a? + b? = 1 is a rotation of the z-plane (defining 
‘rotation’ in some suitable way). Show that any linear trans- 


formation € =$) is a rotation and a change of scale by a 


(positive) scale factor except when a = b = 0. 


2 . 2 , 
5 Leta = cos = , 5b = sin = . Express in terms of a and b 


the coordinates of the vertices of the regular pentagon in- 
scribed in the circle |z| = 1 with one point at z = 1. Deduce 
algebraic relations satisfied by a and b and show that 


„2521 p, M55 
4 2V2 


8.4 | Functions of a Complex Variable 


311 


6 Find all complex numbers z which satisfy z2 + 1 =z 
and plot them in the z-plane. [zê = 1] 


7 Show that if e? is defined as in the text, then e? = cos 0 + 
i sin 0 by showing that the components of e” satisfy the differ- 
ential equations which define the trigonometric functions. 


8 Show that e? = lim (1 + z) for any complex number z. 
n 


no 


9 Prove Liouville’s Theorem: If f(z) is differentiable for all z 
and bounded (there is a K such that | f(z)| < K for all z), 
then f(z) must be constant. [Use the method of the proof of 
the fundamental theorem of algebra.] 


10 Prove the Chain Rule: If f(z), g(z) are (continuously) 
differentiable functions such that the composed function 
f(g(z)] is defined (i.e., the domain of f includes the range of g), 
then the composed function is differentiable with derivative 
, 1 . f(x + sh) — f(x) _ r 
f'le(23]e'(z). [Prove that lim es f' QA and 
s—0 

that this holds uniformly in A; that is, show that for every 
e€ > 0, K > 0 there is a 6 > 0 such that 


Sett- fœ POH ce 


whenever |A| < K, |s| < ô, s = 0. Then use 


fle(x + sh) — flex) 


f Ee +s ete sO — 609) — fi{g(x)] 


S 


To prove that the derivative is continuous, it suffices to show 
that compositions and products of continuous functions are 
continuous.] 


11 Give an example of a real function of a real variable 
which is (continuously) differentiable but not analytic. 


12 Show that the integral formula for f(a), obtained as 
a corollary of the theorem, can also be obtained by differenti- 
ating under the integral sign in the Cauchy Integral Formula. 


13 Let zo, zı be given complex numbers such that zo ¥ 21. 
Show that if c is any positive number, then the complex num- 
bers z such that 


Z— ZO 


Z — Zi 


form a circle in the z-plane except when c = 1. Show that 


Chapter8 | Applications 


312 


this circle coincides with the circle |z| = 1 if and only if 
Zoz1 = 1. [Express the equation in terms of x and y and use 
elementary Cartesian geometry. ] 


14 Alternative proof of the Fundamental Theorem of Algebra. 
As was shown in the text, the Fundamental Theorem of Al- 
gebra is essentially the statement that every real polynomial 
has at least one complex root, since a complex root gives a 
factor of the polynomial. Now a complex root of a real poly- 
nomial is evidenced by the fact that the power series for its 
reciprocal is not convergent—e.g., the complex roots of 
x? + 1 ‘explain’ the fact that the power series 


Arl t 
does not converge for |x| > 1. More specifically, if f(x) is a 
real polynomial and if f has no complex roots f(z) = 0 
satisfying |z| < K, then the theorem of the text shows that the 
Taylor series of 1/f(x) converges to 1/f(x) for all x satisfying 
|x| < K. Hence, using the theorem of the text, the Funda- 
mental Theorem of Algebra is reduced to the following state- 
ment about real numbers: 

Let f(x) = anx” + Qn—1x"~! + +++ + ao be a polynomial 
of degree >1 with real coefficients. Then its reciprocal 1/f(x), 
considered as a function of x, cannot be expanded as a power 
series 


1 2 3 
TN = co + cix + Cox” + eax tt 
f(x) 
which converges for all x. 
To prove this, one first shows that the c’s must satisfy the 
relations 


aoco = | 
aoci + @a1Co = 0 
agcg + aici + a2co = 0 
Q0Cn = =+aiCn-1 +: + aco = 0 

= 0 


avCn+1 + A1Cn + t+ + nC} 


A0Cn+k + a1Cnpk—1 H't + ancy = O. 


That is, one shows that formal multiplication of the power 


series co + cix + cox? +++: by the polynomial ao + 
aix + +++ + anx” gives the power series 
14+-0-x+0-x7+0-x?4+-°--=1. 


This can be proved without reference to complex numbers 
(Exercise 8, §9.6) and will be assumed here. [Note in particu- 
lar that ao must be non-zero.] The problem is to show that 
if these relations are used to define the c’s, then co + cix + 


8.5 | Integrability Conditions 313 


cox* + +++ cannot converge for all x. This is proved by 
noting that the c’s so defined satisfy 


Ck+1 0 1 0 oe. 0 Ck 
Ck+2 0 0 1 an 0 Ck+1 
; An An—1 An—2 ay . 
k TTT Tom k-—1 
+n ao ao ao ao n+ 


*This proof is closely related to a 
method, known as Bernoulli's 
method, of finding complex roots of 
polynomials. See F. B. Hildebrand, 
Introduction to Numerical Analysis, 
McGraw-Hill, 1956. 


Integrability Conditions 


Let A denote the n X n matrix in this relation. Show that 
there is a number œ such that |Av| > alv|, where |v| = 
max(|vi|, (v2|,..., [Va|) for any v in R”. Then show that 
there is a constant K such that any string Ck+1, Cy42,..., 
Crn Of n consecutive c’s must contain at least one c satisfying 
\cx4j;| > Kat, and conclude that co + cix + cox? + °°: 
does not converge for x = a~!.* 


15 Show that if ao ~ 0, then the power series co + cix + 
cox* + +--+ of Exercise 14 does converge for all sufficiently 
small x. Use the theorem of the text to conclude that the limit 
is indeed 1/f(x) for all sufficiently small x. 


8.5 


A curve f(x, y) = const. in the xy-plane is said to be a 
solution of the differential equation A dx + Bdy = 0 
(where A, B are functions of x, y) if at each point (X, F) 
of the curve the 1-form df is a multiple of the 1-form 
A dx + B dy. As was explained in §8.2, the geometrical 
meaning of this definition is that the tangent line 


Of = = = Of pe ayn _ 
ax ODO — X) + 5 WO — y) = 0 
coincides with the line 


A(X, p(x — X) + BX, P)O — F7) = 0 


specified by the differential equation A dx + B dy = 0. 
The solution is said to be non-singular at (X, Y) if df and 
Adx + Bdy are both non-zero at (x,y) (so that the 
above equations actually determine lines). Only non- 
singular solutions will be considered in what follows. 
Similarly, a surface f(x, y, z) = const. in xyz-space is 
said to be a non-singular solution of the differential 
equation A dx + B dy + C dz = 0 (where 4, B, C are 
functions of x, y, z) if at each point (X, Y, Z) of the surface 
the 1-forms df and A dx + Bdy + C dz are non-zero 
and are multiples of each other. Geometrically this 


Chapter8 | Applications 314 


A(x, y; Z)(x _ X) + By(x, J, Z)(y ~ y) + Ci(X, Y, Z)(z — 
A(X, J; Z)(x a X) + B(X, Y, Z)(y ~ F) + C(x, Y, Z)(z — 


*Solutions will be considered locally; 
that is, it wili be assumed that in the 
neighborhood of any point of the 
solution manifold the points of the 
manifold can be described by k 
equations fi = const. satisfying the 
given condition. 


means that the tangent plane to the surface (df = 0) 
coincides with the plane specified by the differential 
equation (A dx + B dy + C dz = 0). 

A pair of differential equations 


| 
© 


A,dx + Bı dy + Cı dz = 
Aodx + Body + Cə dz 


| 
© 


in 3 variables can be regarded as specifying a line 


Ni NI 


we Ne 
| 


0 
0 


through each point (X, Y, Z) of xyz-space. These 2 equa- 
tions in 3 unknowns describe a line if and only if they 
have rank 2, which is true if and only if wiwo =Æ 0, 
where w, = A, dx + B, dy + C, dz, wo = Agdx + 
Body + Cə dz. Accordingly, a pair of differential equa- 
tions w; = 0, we = 0 is said to be non-singular at a 
point (X, F, Z) if the 2-form ww» is not zero at that point. 
A curve f = const., g = const. in xyz-space is said to be 
a non-singular solution of the differential equations 
wı = 0, w = Oif at each point (X, J, Z) of the curve the 
2-forms df dg and wiw are non-zero and are multiples 
of each other. Geometrically this means that the tangent 
line to the curve (df = 0, dg = 0) is indeed a line and 
coincides with the line specified by the equation (w; = 0, 
we = 0) (see Exercise 1). 

In general, a set of k differential equations w; = 0, 


Wo = 0, ..., wx = 0 in n variables (w; = A; dx, + 
Aiz dxo + +` + Ain dxn) is said to be non-singular at a 
point (X1, Xə, .. . , Xn) if the k-form wiwə...wp is not 


zero at that point. A non-singular solution of such a set 
of equations is an (n — k)-dimensional manifold fı = 
const., fo = const., ..., fe = const. such that the 
k-form df,df,...df, is a non-zero multiple of 
ww... wy at each point of the manifold.* 

A set of k differential equations w; = 0, wə = 0,..., 
wy = 0 in n variables (x1, X2,..., Xn) is said to be 
integrable near a point (X1, X2, . . . , Xn) if they are non- 
singular at that point, and if it is true that through every 
point near (X1, X2, .. . , Xn) there is a non-singular solu- 
tion of the equations. In order that a set of differential 
equations be integrable it is necessary and sufficient 
(assuming that the l-forms w; are differentiable) that 
certain conditions be satisfied. Specifically, the theorem 
is the following: 


8.5 | Integrability Conditions 


(A dx + Bdy + cap (S — 


315 


Theorem 
Let wi, ws, ..., Wp be differentiable 1-forms in n variables 
(Xi, X9,..., Xn), and let (X1, Xo,..., Xn) be a point at 


which the k-form w,w»....w, is not zero. If there is a 
non-singular solution of the differential equations 
Ww, = 0, wə = 0,..., wp = O through the point (X1, Xo, 
..., Xn), then the (k + 2)-forms wyw2... wp dw; (i = 1, 
2,...,k) must all be zero at (X1, X2, ..., Xn). Therefore 
if the equations w; = 0, wə = 0,..., wp = O are inte- 
grable near (X1, X9,..., Xn), these (k + 2)-forms must 
be identically zero near (X1, X2, ..., Xn). Conversely, if 
these (k + 2)-forms are identically zero near (X1, Xo, 
wees Xn), 


(1) wiw... a, dw; = 0, (i= 1,2,...,k) 


then there existe > Oand functions f1, fo, . . . , fy defined 
within € of (X1, X2, ..., Xn) such that the manifolds 
fi = const., fo = const., ..., f = const. are non- 
singular solutions of the differential equations w, = O, 
Wo = 0,..., Ok = 0. 


The conditions (1) are called the integrability condi- 
tions for the differential equations w; = 0, wə = 0,..., 
w, = 0. Note that fork = n — 1 they are automatically 
fulfilled (an (n + 1)-form in n variables is necessarily 
zero). For one equation A dx + Bdy + Cdz = 0 in 
three variables there is a single condition 


ðB ðA ac ðB ðA 7 


dy 
that is, 


aC ðB ðA aC ðB OA 
4(~ 5) + 8035 Sa) + (ae ~ 5) = 
The theorem states that the differential equation A dx + 
B dy + C dz = Q describes a family of surfaces in space 
(by specifying their tangent planes) if and only if this 
condition is satisfied. In general, if the integrability con- 
ditions are written out in terms of the coefficient func- 
n 
k+2 
functions; by a method similar to the method of La- 
grange multipliers these can be reduced to a set of just 


k (" 2 ‘) conditions (Exercise 6). The remainder of 


tions, they give k ( ) conditions on the coefficient 


this section is devoted to proving the above theorem. 


Chapter8 | Applications 


316 


Proof 


Let wı = 0,..., wp = 0 be as in the theorem, and let 
(X1, Xo,..., Xn) be a point through which there is a 
solution manifold of these equations. It is to be shown 


that wiwa... wp dw; = 0 at (X1, X2, . . . , Xn) fori = 1, 
2,...,k. By assumption the solution manifold is of the 
form fi(x1,X2,...,X%n,) = const. (i = 1,2,...,h), 


where fi, fo, ..., fk are differentiable functions defined 
near (X1, ¥2,...,Xn) such that df, df....df, Æ 90. 


Choose new coordinates yj, yo, ...,Yn on R” near 
(Xi; Xə,...,Xn) In which the solution manifold is 
{yı = 0, y2 = 0,..., yg = 0}. (For example, take 
Vi = f(X1, Xo, os -3 Xn) — f (Xi, Xo, . -3 Xn) for i = l, 
2,...,k and select Yk41, . . . , Yn from among the coor- 
dinates x1, X2, ..., Xn So that dy, dyə... dyn Æ 0.) 
Expressing w1, Wo, ..., w% in terms of the coordinates y 


the k-form wyw2... wy is a multiple of dy, dyo.. . dyk 
on the solution manifold {yı = 0, yə = 0,..., yy = O}; 
hence setting w; = Ai1ı dy; + °°: + Ain dyn it follows 
(see Exercise 2) that w; is a combination of dyi, dyo, 
..., dyg on the solution manifold. Thus on the manifold 
{yı = 0, yo = 0, ..., ye = 0} the functions Ai; are 
identically zero for j > 0; hence their partial derivatives 
with respect to yes1, Vere, --+ sn are also zero on this 
plane. Thus when dw; is expressed in terms of the dy’s, 
dwi = >° By dy, dy,, the terms in which both u and v 
are greater than k are all zero at points of {y; = 0,..., 
yk = 0}. Since wywo... wy is a multiple of dyydy2...dyxz 
at such points it follows that w,w....w, dw; is zero at 
such points, as was to be shown. 

The proof of the remaining half of the theorem is 
facilitated by the observation that the given equations 
wi = 0, wo = 0, ..., wp = 0 can be replaced by any 
other system w; = 0, w3 = 0,..., œw} = 0 in which 


wr, = 2 bijw 
(2) j=l (i = 1,2,...,%) 
k 
Oo; = C550; 
j=l 


where the coefficients b;;, c;; are differentiable functions 
Of (X1, Xe,..., Xn). SINCE WyWo... wp is a multiple of 
wiw... wp and vice versa, a solution manifold of 
{wi = 0,w2 = 0,..., wp = 0} is a solution manifold 
of {wi = 0, w3 = 0,..., w, = 0} and conversely. On 
the other hand the equations {w; = 0} satisfy the inte- 
grability conditions if and only if the equations 


8.5 | Integrability Conditions 


317 


k 
{wi = 0} do because wiw; . . . w dw; = $, Awiw... wh 
j=1 

[c;; dw; + dc;;w;] = 0 if wiwe... wp dw; = 0 (all j), and 
similarly wiwo... wp dw =0 if wiws... w, dw; = 0. 
Thus instead of showing that the given equations 
{w; = 0} are integrable it suffices to show that some 
combination of them is integrable. 

The case k = n — 1 is the most important one. In 
this case the integrability conditions are automatically 
satisfied and the theorem is essentially the existence 
theorem for ordinary differential equations (§7.4). 
Specifically the case k = n — 1 can be deduced from the 
theorem of Chapter 7 as follows: 


Given differentiable 1-forms w1, w2, .. ., @n—ı defined 
near (X1, X2,..., Xn) with wywe...W,_1 #0 at 
(X1, X2, . . . , Xn) one can assume, by reordering the coor- 


dinates if necessary, that the dx, dXə...dXn—ı com- 
ponent of wwe... Wn—ı is not zero. Then the equations 


A1ıdxı + A12 dx2 + °°* + AindXn 


W1 
Wn—1 = An—1,1dX1 + An—1,2 dXx2 + `° ' + An—1,n dXn 


can be solved near (X1, X2, . . . , Xn) to give 


dx, = Byywy +e + Bi n—1@n—1 + Byn dXn 


dXn—ı = By—1,101 +’ + Bn—i,n—1®n—1 + Bn—i,n dXn. 


n—i 
Setting w. = >> B,,w; and C; = Bin, this gives w; = 
j=1 


dx; — Cidxn. Since w,w5...@,—-1 Æ O it suffices to 
prove that the equations {dx; — C;dx, = 0;i = 1, 2, 
...,” — 1} are integrable in order to prove that the 
given equations {w; = 0} are integrable. 

Now the existence theorem for ordinary differential 
equations implies that there is a solution (x(t), x(t), 
...,Xn(t)) of the differential equation 


d. 

T = C1(x1 (2), e.g Xn(t)) 
dXn— 

Trz = C,_1(x1(t),..-, Xa(®)) 
dXn 

g = 1 


satisfying (x,(0),..., Xn(O)) = (x1, X2, - - - , Xn) for any 


Chapter8 | Applications 


318 


(Xi, Xo,...,Xn) near (X1, Xo,...,X,). Moreover, the 
solution depends differentiably on the initial condition 
(X1, X9,...,Xn). Taking the initial condition to be 


(X1 + Vi, X2 + Y2,- +s Xn—1 + Yn—1, Xn), it follows 
that there is a differentiable map 


. Xi = BV 1, Y2- <- , Vn—15 É) (i= 1,2,...,2) 


defined for (71, Y2, .-- 5 ¥n—1, £) within e of (0, 0,..., 0) 
such that 


2i(V 15 Vos + +s Yn—1, 0) = X; + yi 
(i= 1,2,...,n-D 


EnV 1, Y 25 ee es Yn—15 0) = Xn 
and such that 


08; 
T (V1, V25+++5Yn—1s t) = Cigi, ee s Vn—1s t)] 


ð 
son (Yis Ya - . +59 Yn—=is t) = l. 


On the plane t = 0 the pullback of dx; is identically dy; 
(i= 1,2,...,n — 1), whereas the coefficient of dt in 
the pullback of dx, is identically 1 at all points. Hence 
the pullback of dxıdXxə...dX„ at (0,0,...,0) is 
dy, dyə ... dyn—ı dt. Therefore the map g is invertible 
near (0,0,...,0), So (Y1, Y2, ..-,Yn—1,ź) can be ex- 
pressed as functions of (x1, X2,..., Xn) near (X1, Xo, 

. Xn). Now the map g was chosen in such a way that 
the pullback of dx; — C;dx, has no term in dt. Thus 

n—1 
dx; — C; dXn = > dij dy;. 
j=l 

To prove that the given equations are integrable it 
suffices, therefore, to prove that the equations {dy; = 0; 
i= 1,2,...,n — 1} are integrable. But the lines 
{y; = const.; i= 1,2,...,n — 1} are solutions of 
these equations and the case k = n — 1 follows. 

The case n — k > 1 can be deduced from the case 
n — k = 1 as follows: Given w 1, wo,..., wp and given 
(Xi, X9,..., Xn), One can reorder the coordinates and 
put the equations in the form 


n 


(3) dx;- >> Byjdx;=0 (i= 1,2,...,k) 


j=k+1 


as was done above in the case n — k = 1. Adding to 
these k equations the equations dx,1,; = 0,..., 


8.5 | Integrability Conditions 319 


Exercises 


dX, —, = 0 gives n — 1 equations which, by the above, 
can be solved. The solution can be described by giving 
a new system of coordinates (y1, Y2,...,Yn) near 
(Xi, X2, ..., Xn) in which the lines y; = const. (i = 1, 
2,...,” — 1) are solution curves. This implies that the 
expression of w; in terms of yy, yo, ..., Yn contains no 
term in dyn. Since the reduction ofw = 0, wə = 0,..., 
wy = 0 to the form (3) relative to the coordinates y 
introduces no terms in dy,, one can assume at the outset 
that the given system is in the form (3) with Bin = 0 
(i= 1,2,...,k).Thenfork < j < n — | the (k + 2)- 
form ww... wp dw; contains just one term in dx, dx» 


B;; 
... ax, dx, dx;, and the coefficient of this term is —~ - 
Xn 
Hence the integrability conditions imply 
OBij _ (i= 1,2,...,k) 
OXn G=k-+1,...,n— 1). 


Thus the equations (3) do not involve the variable x, 
and the solution of the given system is reduced to the 
solution of k equations in n — 1 variables. Repeating 
this process j times reduces the given equations to k 
equations in n — j variables. When j= n — k the 
equations (3) become simply dx; = 0 (i = 1,2,...,k), 
with the solutions x; = const. (i = 1,2,...,k). This 
completes the proof of the theorem. 


1 (a) Show that two equations in three unknowns 


A(x — ¥) + Bly — y)+ Cz- z) = 0 
D(x — xX) + Ely — F) + F(z — 3) = 0 


(where A, B, ..., F are numbers) define a line in xyz- 
space if and only if (A dx + Bdy + Cdz)\(Ddx + 
Edy + F dz) ~ 0. [The Implicit Function Theorem 
for Affine Maps.] 

(b) Let w1, w2 and w}, ws be two pairs of constant 1-forms 
in xyz such that wjw2 ~ 0 and wiw, = 0. Show that 
wwz is a multiple of wiw if and only if there exist 
numbers b;; such that 


/ 
w1 = biwi + dDiewe 


/ 
w2 = boiwi + beewe, 


and that when this is the case, the numbers b;; are 
uniquely determined. [If (*) holds, then obviously 
wiw, is a multiple of wiwe. To prove the converse, 


Chapter8 | Applications 320 


Introduction to Homology Theory 


choose a constant 1-form w3 such that wiw2w3 ¥ 0. 
Every 1-form w’ can then be written w = awi + 
a2w2 + a3w3. The number a, is determined by the 
fact that it 1s the coefficient of wiwew3 in w’wow3; 
similarly for a2, a3.] 

(c) Show that two pairs of equations as in (a) describe 
the same line in xyz-space if and only if the corre- 
sponding 2-forms are multiples of each other. 


2 Let w1, wa, ..., wy and wi, wo, ..., w. be two sets of 
of 1-forms in n variables such that wiwa...w;, = 0 and 
wiw... w Æ Oat (X1, X2,.-., Xn). Show that wyw2... wz 
is a multiple of wiws...w, at (%1, X2,..., Xn) if and only 
if there exist numbers b;;, c;; such that (2) is satisfied at 
(X1, X2,..., Xn). Conclude that the analog of Exercise 1 is 
true for (n — k)-dimensional affine manifolds in n-space. 
[Use the method of 1(b).] 


3 Find the integrability conditions for two differential equa- 
tions in four variables 


Adx+ Bdy+Cdz+ Ddt=0 
Edx + Fdy+Gdz+ Hdt =Q 


as explicit relations on the coefficient functions 4, B,..., H 
and their derivatives. 


4 Use the theorem of this section to prove that there exists 
an integrating factor of any non-singular equation A dx + 
B dy = Q in two variables, as was stated in §8.2. 


5 Deduce the theorem of §7.4 from the theorem of this 
section. 


6 Given w1, wo, ..., wp With wiw2...w, Æ 0 as in the 
theorem, choose additional 1-forms w,41, Wk+2, ..- , Wn such 
that wiwe...w, Æ 0. Show that every 2-form can be written 
in just one way as > a@,,W,,, Where the sum is over all u, v 
satisfying 1< u <v <n. Use this observation to give 


(" > k k conditions which are equivalent to the integra- 


bility conditions (1). 


8.6 


Roughly speaking, homology theory is devoted to the 
question, “When is a k-form exact?” That is, “Given a 
k-form w, under what conditions is there a (k — 1)-form 
g such that w = do?” 

For the sake of simplicity it will be assumed that the 
given k-form w is differentiable. Then if w is exact, say 
w = do, it follows that dw = d[da] = 0, 1.e. w is closed. 


8.6 | Introduction to Homology Theory 


(x,y) 


( 


x” _Y__) 
x?+y?' x2+y? 
| x 
\ (0, 0) 
\ 
N 


321 


In other words, a necessary condition for a differentiable 
k-form to be exact is that it be closed. This condition is 
not sufficient, however, as is shown by the 1-form 


x dy — y dx 
(1) e 


which is closed but not exact. To prove that this 1-form 
is not exact it suffices to observe that its integral over the 
circle x? + y? = 1 is not zero (this integral is +2r, 
depending on the orientation), whereas if it were exact 
its integral over this closed curve would have to be zero. 
Intuitively it is clear that this fact is related to the fact 
that the l-form (1) is not defined at the origin; in fact, 
the integral over the circle of a closed 1-form w which is 
defined at all points of the xy-plane is fap w = fp dw = 
0, where D = {x? + y? < 1}. Thus the existence of a 
closed form (1) which is not exact is related to the con- 
figuration or ‘topology’ of the domain of the form, 
which in this case is the xy-plane with the point (0, 0) 
removed. An explicit theorem to this effect is the 
following. 


Theorem 


Let w be a differentiable 1-form w = A dx + Bdy 
defined at all points of the xy-plane except (possibly) the 
origin. Then w is exact if and only if (1) w is closed, and 
(ii) the integral of w over the circle x? + y? = 1 is zero. 


It is easy enough to see that this is the case: If w is 
exact, 1.e., if w = df for some function f, then (i) and 
(ii) must hold. Conversely, given w satisfying (i) and (ii) 
one can define a function f(x, y) by 


GEFI? 2249) (7) 
sæ» = | ot foe, aye 
(1,0) (Sapp x2+72) 


where the first integral is over the circle and the second 
integral is over the radial line. The circular path from 


(1, 0) to (o =>) can be taken in either 
x +y x+y 

sense, clockwise or counterclockwise, and can consist of 

any number of circuits, because by (ii) the integral over 

a complete circuit in either sense is zero. Now if (x + Ax, 

y + Ay) is near (x, y), then the integral of w over the 

path consisting of the line segment from (x,y) to 


Chapter 8 | Applications 


322 


(x + Ax,y + Ay), the radial line from (x + Ax, 


( x 

y x2 + y? ° 
4 =) , and the radial line back to (x, y) is zero, 
either because this path bounds a 2-dimensional region D 
and therefore {yn w= fpdw = 0 by (i), or because 
this path lies entirely on one radial line so that each 
point covered is covered twice with opposite orientations. 
Thus fEAT w is f(x + Ax,y + Ay) — f(x, y) 
from which w = df; hence (i) and (ii) imply that w is 
exact. This geometrical argument does not of course 
constitute a rigorous proof of the theorem. The purpose 
here is merely to illustrate the sorts of ideas which are 
involved in homology theory. 

In general, homology theory is concerned with the fol- 
lowing type of problem: Given an n-dimensional domain 
D, what conditions (if any) must be added to the condi- 
tion dw = 0 in order to obtain necessary and sufficient 
conditions for a differentiable k-form w on D to be exact? 
The following theorems answer this question in a few 
specific cases. 


y + Ay) to the circle, the circular path to 


Theorem 


A differentiable 1-form w = A dx + B dy + C dz which 
is defined at all points of xyz-space except (possibly) at 
the origin is exact if and only if it is closed. On the other 
hand, a differentiable 2-form w = A dy dz + B dz dx + 
C dx dy defined at all points of xyz-space except (pos- 
sibly) the origin is exact if and only if (1) it is closed, and 
(ii) its integral over the sphere x? + y? + z? = 1 is 
Zero. 


Theorem 


A differentiable 1-form w = A dx + B dy + C dz which 
is defined at all points of xyz-space except (possibly) on 
the circle {x? + y? = 4, z = 0} and on the line {x = 0, 
y = 0} is exact if and only if (i) it is closed, and (ii) it 
satisfies the two additional conditions 


| w=0 | w=0 
Yı Yə 


where 7, is the circle {x? + y? = 1,z = 0! and 7% is 
the circle {(x — 2)? + z? = l,y = 0}. A differentiable 


8.6 | Introduction to Homology Theory 


323 


2-form w = A dy dz + B dz dx + C dx dy defined at all 
points of xyz-space except (possibly) on the circle 
{x? + y? = 4,z = 0} and on the line {x = 0, y = 0} 
is exact if and only if (1) it is closed, and (11) it satisfies the 
additional condition hyo = 0, where Y is the torus 
obtained by rotating the circle {(x — 2)? + 2? = 1, 
y = 0} around the z-axis. 


Each of these theorems is of the form ‘‘a differentiable 
k-form w defined at all points of the domain D is exact 
if and only if (i) it is closed, and (ii) it satisfies the addi- 


tional conditions fy, w = 0, fy,w = 0,..., fx, = 0.” 
(In the examples above, v is 0, 1, or 2.) The k-dimensional 
domains of integration Yı, Yo, ..., Y, are called a 


‘homology basis’ (in dimension k) of the n-dimensional 
domain D. In general a homology basis is defined as 
follows: Each of the elements Yi, Yo, ..., Yp of a 
(k-dimensional) homology basis is a k-dimensional 
‘domain of integration’ contained in D. Specifically, 
each Y is either a compact, oriented, differentiable, 
k-dimensional manifold-with-boundary (for example, 
the circle x? + y? = 1 or the sphere x? + y? + z? = 1 
in the examples above); or else Y is a collection of a 
finite number of such k-manifolds with fyw defined to 
be the sum of the integrals of w over each k-manifold in 
the collection (for example, the boundary of the square 
{|x| < 1, |y| < 1} or of the cube {|x| < 1, |y| < 1, 
|z| < 1}). Such a collection of k-manifolds is called a 
k-chain. Thus a (k-dimensional) homology basis of D 
is a set Yi, Yo, ..., Y, of k-chains Y; in D. To be a 
homology basis these k-chains must be such that: 


(a) The conditions fy, w = 0 are necessary for w to be 
exact, 1.€., hy, do = O for all (k — 1)-forms a. 
By Stokes’ Theorem, this means that fav, o = 0 
for all (k — 1)-forms ø; hence Y; ‘has no bound- 
ary’. If Y; is a single k-manifold, this means literally 
that it is a manifold without boundary; whereas 
if Y; consists of several k-manifolds, then it means 
that the boundaries of the pieces cancel. For 
example, the six pieces which form the boundary 
of a cube have boundaries which cancel. (Each 
of the 12 edges of the cube occurs twice, with 
opposite orientations, in the boundary of the 
6 faces.) A chain with this property is called a 
cycle, and the requirement is that the chains Y; 


Chapter 8 | Applications 324 


in a homology basis be cycles. The Y’s in the theo- 
rems above are clearly cycles, because they are 
manifolds without boundary. 

(b) The conditions J, w =0 (i= 1,2,...v) and du = 0 
are independent. This means that for each i = 1, 
2,..., V there is an w such that dw = O, frw = 0 
(j # i), but fy, w Æ 0. (Also, it means that there 
is an w such that fy, w = 0 for all 7 and such that 
dw ~ 0; however, this imposes no conditions on 
the Y’s at all.) In the only case where this applies 
in the theorems above—namely, the case of the 
two circles in the last theorem—this condition 1s 
easily shown to hold [Exercise 1]. 

(c) The conditions dw = 0 and fy,w=0 (i= 1, 
2,...,v) are sufficient for a differentiable k-form 
w to be exact. That is, if w is differentiable and 
satisfies these conditions, then there is a differ- 
entiable (k — 1)-form ø such that do = w. 


This definition of ‘homology basis’ is only a definition 
and no assertion is made that there exists a homology 
basis for an arbitrary domain D. The purpose of the 
definition is to prescribe a specific form in which the 
answer to the question “When is a k-form exact?” can 
be given. Namely, the question is to be answered by 
giving a k-dimensional homology basis of the domain of 
the form; then the question “w = da?” is answered by 
testing the finite number of conditions dw = 0, 
fy, w = 0,..., fy, w = 0 which are necessary and suff- 
cient forw = do. 

Note that unlike most of the subjects discussed in this 
book, the problem of finding a homology basis for a 
given domain is a global problem, depending on the 
whole domain and the way it fits together. Historically, 
the subject of algebraic topology (also called combina- 
torial topology) developed from the study of this global 
problem. 

The theorems stated above are not especially difficult 
to prove, but the techniques of proof belong more to the 
subject of topology than to calculus, so it 1s not appro- 
priate to enter into them here. A theorem of this type 
which does properly belong in a calculus book is the 
theorem that every closed form is locally exact. In terms 
of homology this is a consequence of the fact that an 
n-dimensional cube is homologically trivial (v = 0 for 
all k). This fact is known as Poincaré’s Lemma. 


8.6 | {Introduction to Homology Theory 


325 


Poincaré’s Lemma 


A differentiable k-form w defined at all points of an 
n-dimensional cube {|x; — Xı| < a, |x2 — Xo| < a, 
..+5|Xn — Xn| <a} in R” is exact if and only if it is 
closed. 


Proof 


If do = w then dw = dide] = 0, so the condition 
dw = Q is, as always, necessary for w = do. It is to be 
shown that when the domain is a cube it is also sufficient. 
To this end let f: R”! — R” be the map defined by 


Xi = Xi + tyi, X2 = Xo + ta,...,Xn = Xn + Wn, 


where (Xj, X2,..., Xn) are the coordinates of R”, 
(Xi, X2,..., Xn) is the center of the cube in question, 
and (t, V1, ¥2,.-.5,¥n) are the coordinates on R®T?. 
This map is differentiable and carries points of the set 
{O<t<1,|yi] <a,...,|yn| <a} into the cube 
where w is defined. Let fé f*(w) denote the (k — 1)-form 
IN Yis Y2, -Yn found by dropping all terms in f*(w) 
which do not contain dt, writing terms which do contain 
dt with dt in the first position, integrating the coefficients 
of these terms dt from t = 0 to t = 1, and dropping dt. 
Finally, let ø be the pullback of fé f*(w) under the map 

Vi = Xi — X1, Y2 = X2 — X2,..., Yn = Xn — Xn. 
Then ø is a differentiable (k — 1)-form in x1, X25... , Xn. 
It will be shown that do = w as desired. 

Let R be a small oriented k-dimensional rectangle lying 
in a coordinate direction in the cube {|x; — X,| < a}. 
It will suffice to show that Sr wW = Jor o for every such 
rectangle R. Let J X R denote the oriented (k + 1)- 
dimensional rectangle in R”! consisting of all points 
(t, Y1, Y2, ..-, Yn) such that O < ¢ < 1, and such that 
X; + y;is in R for i = 1,2,...,n. By Stokes’ theorem 
JarxrmS*(w) = fixr alf*(w)] = frxrf*(dw) = 0, be- 
cause w is closed. Now ô(I X R) can be divided into 
three pieces, the two ends {0} X R and {1} X R, and 
the sleeve J X OR. The orientation of the sleeve as the 
boundary of J X R is opposite to the orientation of 
I X OR when ðR is oriented as the boundary of R, hence 


-| p+ | ro-] ro 
{0 XR {1} XR IxoR 


Chapter 8 | Applications 


326 


Exercises 


The first integral on the left is zero because the map f 
collapses this end to the single point (X1, Xo, ..., Xn), 
whereas the second integral on the left is fg w because 
the map f on this end is the translation x; = y; + X; 
(i= 1,2,...,m). Carrying out the integration with 
respect to f, the integral on the right becomes 


[.( r): 


where fe f*(w)is the (k — 1)-form defined above. Since 
this is fo ro the equation is f RO = fo r o as desired, and 
Poincaré’s Lemma is proved. 


1 Show that the two circles Yı, Yo of the third theorem 
satisfy the independence condition (b). [Use cylindrical coor- 
dinates (r, 0, z). Then fy, d0 = 0, fy, d0 = 0. Find a closed 
1-form in r, z (defined for (r, z) = (2, 0)) whose integral over 
Yə is not zero, and show that its integral over Yı is zero. 
Express both 1-forms in terms of x, y, z.] 


2 (a) Show that the boundary of the square {|x| < 1, 
|y] < 1} is a cycle, that is, that the integral of dø over 
this set of oriented curves is zero for any differentiable 
function ø. 


(b) By parameterizing each of the four sides of the square 
{Ix| < 1,|y| < 1}, write the integral of 


xdy — ydx 
x2 + y2 


over the boundary of the square as a sum of four 
definite integrals, combine into one term, and show 
that the integral is not zero. What is the value of this 
integral ? 


3 Give a geometrical ‘proof’, like the one in the text, that 
the circle {x? + y? = 1} in the first theorem can be replaced 
by the boundary of the square {|x| < 1, |y] < 1}. 


4 Give a geometrical ‘proof’ that the 1-dimensional homol- 
ogy of R? with the origin removed is trivial (i.e. v = 0) by 
arguing that every point can be Joined to the point (1, 0, 0) 
by a curve which does not pass through the origin, and that 
two such curves are the boundary of a 2-dimensional mani- 


8.6 | Introduction to Homology Theory 


327 


fold. [This argument is essentially the one given in the text 
in the case of R?, with the origin removed, but it leaves even 
more to the imagination. | 


5 Give a geometrical ‘proof’ that the 2-dimensional homol- 
ogy of the cube {|x| < 1, |y| < 1, |z| < 1} is trivial. [Given 
a closed 2-form w, define a 1-form o by defining the integral 
of o over a curve to be the integral of w over the 2-manifold 
swept out by a line segment from the origin to the curve as 
the far end moves along the given curve. Argue that do = w. 
This is the geometrical idea behind the proof in the text of 
Poincaré’s Lemma. ] 


6 Give a geometrical ‘proof’ that a closed 2-form on the 
sphere {x* + y? + z? = 1} is exact if its integral over the 
entire sphere is zero. [Repeat the argument of Exercise 5, 
replacing the line segment from the origin to the curve with 
the great circle arc from (1, 0,0) to the curve. The extra 
condition is needed to guarantee fsde = fsw for small 
2-dimensional manifolds which contain the antipodal point 
(—1, 0, 0).] 


7 ‘Prove’ that the sphere {x? + y? + z? = 1} is a 2-di- 
mensional homology basis for R? with the origin removed. 
[This is a slight extension of Exercise 6.] 


8 Find a basis of the 1-dimensional homology of the plane 
R? with the two points (+1, 0) removed. 


9 Show that part II of the Fundamental Theorem of Cal- 
culus as stated in §3.1 is, except for a differentiability assump- 
tion, a special case of Poincaré’s Lemma. Show the relation- 
ship between the proofs of these two facts. 


10 Betti numbers. Show that if Y1, Yo, ..., Y, and Yi, Y3, 
..., Y, are two k-dimensional homology bases of the same 
domain D, then u = v. That is, any two homology bases 
contain the same number of cycles. This natural number 
(or zero) is called the kth Betti number of the domain D. 
[Let wi, w2, . . . , w, be closed k-forms on D such that f{,,; is 
0 ifi ¥ j, 1 if i = j. This is possible by condition (b) of the 
definition of homology basis. Define the ‘period matrix’ (a;;) 
by ai; = fy wj, so that (a;;) isa u X v matrix. Show that if w 
is a closed k-form on D, then its “Y’ periods’ can be obtained 
from its ‘Y periods’ b; = fy, w by the formula 


Vv 
,® = 5 aijb;. 
t 


Y j=1 


(Show first that w — >°b,w; is exact.) In the same way the Y 
periods can be obtained from the Y’ periods. Thus (a;;) de- 
fines a map R” — R” which is invertible; hence v = y.] 


Chapter8 | Applications 


328 


Flows 


8.7 


A ‘flow’ is an imaginary physical phenomenon in which 
space is filled with a moving fluid which consists of 
infinitely many particles. Such a phenomenon can be 
described mathematically in two quite different ways— 
by following the particles, and by standing still and 
counting the particles as they go by. 

The first of these descriptions consists of three func- 
tions of four variables 


x = f(a, b, c, t) 
(1) y = g(a, b,c, t) 
z = h(a, b,c, t). 


Here ¢ is time and the three coordinates (a, b, c) can be 
considered as naming the particles; for fixed (a, b, c) the 
values of (1) for varying values of ¢ describe a curve in 
xyz-space which is the trajectory of the particle named 
(a, b,c). One assumes that for each fixed value of t 
there is exactly one particle at each point of space and 
vice versa; that is, one assumes that the equations (1) 
can be put in the form 


a= F(x,y, Z, t) 
(2) b = G(x, y, Z, t) 
c= H(x, y, z, t) 


giving the name of the particle which is at the point 
(x, y, Z) at the time t. The trajectories are then defined 
implicitly by a = const., b = const., c = const. An- 
other flow 


X = fila, B, Y, t) 
(1°) y= g(a, B, Y, t) 
Z = hy(a, B, Y, t) 


is considered to be the same as the given flow (1) if it 
describes the same trajectories. If this is true, then the 
curves a = const., b = const., c = const. are identical 
to the curves a = const., 8 = const., Y = const., which 
implies that a, b, c can be written as functions of a, 8, Y 
and vice versa (by the Implicit Function Theorem). 
Conversely, if a, b, c are functions of a, 8, Y, then sub- 
stitution of these functions into (1) gives a new flow (1’) 
which is the same as (1). 


8.7 | Flows 


329 


(3) 


The second method of describing a flow is to give a 
3-form in (x, y, Z, t), 


w = A dx dy dz + B dy dz dt + C dz dxdt + D dx dy dt 


describing the ‘number of trajectories which intersect 
a given 3-rectangle in xyzt-space’. The function 
A(x, y, z, t) gives the ‘number’ of trajectories which 
intersect 3-rectangles in planes ¢ = const.; that is, 
A(x, y, Z, t) is the density of the particles (per volume 
dx dy dz) at the time t in the vicinity of the point (x, y, Z). 
The function B(x, y, z, t) gives the ‘number’ of trajec- 
tories which intersect 3-rectangles in planes x = const., 
that is, the number of particles which cross 2-rectangles 
x = const. during time intervals {t < t < t+ At}. 
B(x, y, Z, t) 18 therefore the rate of flow (per time dt) of 
particles across rectangles x = const. (per oriented area 
dy dz). Similarly, Cand D are rates of flow across surfaces 
y = const., z = const. respectively. Since any trajectory 
that enters a 4-rectangle in xyzt-space must also leave it, 
it follows that the total value of w on the boundary of 
any 4-rectangle is zero (when orientations are assigned 
consistently), i.e., dw = 0. It will be assumed therefore 
that a 3-form (3) describing a flow is closed. In more 
physical language, the assumption dw = O means that 
any change in the number of particles in a region D of 
xyz-space is accounted for by the flow of particles 
across 0D. 

Naturally one would suppose that any flow described 
by equations (1) could be described by a 3-form (3) and 
vice versa. The translation from one form of description 
to the other is accomplished as follows: 

Using the map (1) the 3-form w can be written in terms 
of (a, b, c, t). Since the curves a = const., b = const., 
c = const. are trajectories, it follows that there is no 
flow across surfaces a = const., b = const., or c = 
const., and hence that the db dc dt, dc da dt, and da db dt 
components of w are zero. Therefore w is of the form 


ðE 
w = Eda db dc. Moreover, since dw = 0 implies Pin 0, 
the function E is independent of t and 


w = E(a, b, c) da db dc. 


The function E(a, b, c) can be regarded as the “number 
of particles whose names (a, b, c) lie in a 3-rectangle of 


Chapter 8 | Applications 


330 


abc-space,”’ i.e., the density (per da db dc) of the particles. 
This function is not determined by the description (1), 
which gives the trajectories but not the density of the 
particles; hence (1) must be supplemented by giving the 
3-form E(a, b, c)dadb dc. Then the closed 3-form w in 
(x, y, Z, t) is found by putting the equations (1) in the 
form (2) (which is possible by assumption) and using (2) 
to express E(a, b, c) da db dc in terms of (x, y, Zz, t). 

To find functions (1) and the 3-form E(a, b, c) da db dc 
given the closed 3-form (3), one proceeds as follows: 
The flow does not determine the functions (1), but only 
the trajectories a = const., b = const., c = const.; 
that is, the flow determines the curves da = 0, db = O, 
dc = 0. It is to be shown that the closed 3-form w can 
be used to find these trajectories. In xyzt-coordinates the 
trajectories are naturally described by differential 
equations 


where u(x, y, z, £) is the x-component of the velocity of 
the particle at the point (x, y, z) at the time t, and simi- 
larly for v, w. These functions u, v, w must have the 
property that the curves defined by the differential 
equations 


dx — udt = 0, dy — vdt = 0, dz — wdt = 0 
are identical to the curves defined by 
da = 0, db = 0, dc = 0. 


This is true if and only if the 1-forms da, db, dc can be 
expressed as combinations of the 1-forms dx — u dt, 
dy — v dt, dz — w dt and vice versa, which is true if and 
only if the 3-form da db dc is a multiple of 


(dx — udt)(dy — vdt)(dz — w dt) 
= dx dy dz — u dy dz dt — v dz dx dt — w dx dy dt. 


Since w is E(a,b,c)dadbdc, this implies that w is 
a multiple of dxdydz — udydzdt — v dz dxdt — 
w dx dy dt; hence if w is given, then the functions u, v, w 
can be determined immediately from 

B C 


= — — 9) = — — > __?P, 
ry hay iy 


8.7 | Flows 


331 


provided A + 0. By the existence theorem for ordinary 
differential equations the equations 


dx — udt = 0, dy — vdt = 0, dz — wdt = 0 


then have a solution of the form a = const., b = const., 
c = const., where 


a= F(x, y, Z, t) 
(2) b = G(x, y, z, t) 
c= H(x, y, z, t). 


If w is given with A # 0 and 1f u, v, w (hence a, b, c) are 
determined in this way, then it follows that da db dc 
is a multiple of (dx — udt)(dy — bdt)(dz — w dt) = 
dx dy dz — u dy dz dt — v dz dx dt — w dx dy dt, which 
is in turn a multiple of w. In particular the dx dy dz com- 
ponent of da db dc is not zero and (locally) the equations 
(2) can be put in the form 


x = f(a, b, c, t) 
(1) y = g(a, b,c, t) 
z = h(a, b,c, t). 


When these functions are used to express w in terms of 
a, b, c, t, the result is of the form E(a, b, c, t) da db dc 
(because w is a multiple of da db dc by the choice of 
a,b,c), hence of the form E(a, b, c) da db de (because 
dw = 0 by assumption). 

These pseudo-physical statements can be summarized 
by the following mathematical theorem. 


Theorem 


Let w = Adxdydz+ Bdydzdt+ Cdzdxdt+ 
Ddxdydt be a closed 3-form on xyzt-space and let 
(X, 7, Z, i) be a point at which A = 0. Then there exist 
functions 


a = F(x, y, Z, t) 
(2) b = G(x, y, Z, t) 
c= H(x, y, Z, t) 


and a 3-form E(a, b, c) da db dc such that w is the pull- 
back of E(a,b,c)dadbdc under (2) at all points 
(x, y, Z, t) near (X, Y, Z, t). Since the dx dy dz component 


Chapter8 | Applications 


332 


of dadb dc is not zero, these functions can be solved 
to give 


x = f(a, b,c, t) 
(1) y = g(a, b,c, t) 
z = h(a, b,c, t). 


The trajectories a = const., b = const., c = const. 
determined by (1) depend only on the given 3-form w. 
Hence the 3-form w describes a flow. 


It is customary to denote the density A(x, y, z, t) by 
p(x, y, Z, t) so that the closed 3-form w describing the 
flow is 


w = p dx dy dz — pu dy dz dt — pv dz dx dt — pw dx dy dt. 
The equation dw = 0 then takes the form 


Op, oO ð ð = 
97 T gz Ow) + ay (pv) + = (pw) = 0 


which is known as the ‘continuity equation’. 

The density p is closely related to the integrating factor 
p of §8.2. Specifically, the argument above can be used 
to prove the following generalization to n dimensions. 


Theorem 


Let w be an (n — 1)-form on R” and let (X1, X2, ..., Xn) 
be a point of R” at which w = 0. Then there exist 
functions 


Yr = fiilXq, X2... Xn) 


Yn-1 = Fn—1(%1; XQ, +265 Xn) 
and 
p(x, XQ, 2-565 Xn) 
defined near (Xi, X2,...,Xn) such that p #0 at 


(Xi, X2,.--, Xn) and such that pw = dy; dyə . . . dYn—ı. 
The (n — 1)-form w is closed if and only if p is constant 
on these curves, that is, p can be expressed as a function 
of (71, ¥2,--+sYn—1)- The curves y; = const. are deter- 
mined by the (n — 1)-form w and if w is closed then the 
number f,w can be imagined as ‘the number of curves 
y; = const. which intersect the surface © when these 
curves are drawn with the correct density. 


88 | Applications to Mathematical Physics 


Exercises 


Applications to Mathematical 
Physics 


333 


1 A flow 
p dx dy dz — pu dy dz dt — pv dz dx dt — pw dx dy dt 


is said to be ‘divergence-free’ if the density p is independent 
of t. Show that this is true if and only if the 2-form pu dy dz + 
pu dz dx + pw dx dy giving rate of flow (per time) across 
surfaces in xyz-space is closed. 


2 The second theorem of the text says that locally a closed 
(n — 1)-form on an n-dimensional domain can be described 
by curves. Give such a description of the following (n — 1)- 
forms: 


(a) dx on the xy-plane 
xdy — y dx 
x2 + y2 
xdx + ydy 
x2 + y? 

(d) dx dy on xyz-space 
(e) x dy dz + y dz dx + z dx dy 
(x2 + y2 + 72)3/2 


Be sure to include the density with which the curves must be 
drawn. 


(b) 


on the xy-plane 


(c) 


on the xy-plane 


on xyz-space 


3 Prove the second theorem of the text. [Use 1/p in proving 
that w is closed if and only if p is a function of yi, yo,..., 
y n—1.-] 


8.8 
The Heat Equation 


Suppose that a solid body occupies a volume V of xyz- 
space, and let T(x, y, z, t) be the temperature at the point 
(x, y, Z) of the solid at the time ż. The heat equation is a 
relationship which must be satisfied by the function 
T(x, y, z, t) if the phenomenon of ‘temperature’ is ade- 
quately described by the following assumptions: 


(a) Changes in temperature can be accounted for by 
the motion of a fictitious fluid called ‘heat’. 

(b) The density of heat at (x, y, z) at time ¢ is propor- 
tional to the temperature. The constant of pro- 
portionality c(x, y,z), which may depend on 


Chapter8 | Applications 334 


*That is, the derivative in the direction 
perpendicular to the rectangle. 


(x, y, Z) but not on ¢, is called the ‘heat capacity’ 
of the solid at (x, y, z): 


amount of heat 


= (temp.) X (heat capacity) X (volume). 


(c) The rate of flow of heat across any small rectangle 
in the solid is proportional to the area of the 
rectangle and to the normal derivative* of the 
temperature, the flow being in the direction of 
decreasing T. The constant of proportionality 
k(x, y, z), which may depend on the point (x, y, z) 
but not on the time ¢ or the direction of the small 
rectangle, is called the “conductivity’ of the solid 
at (x, y, z). (If the conductivity depends on the 
direction as well as the position of the rectangle, 
the medium is said to be anisotropic and the heat 
equation 1s slightly more complicated.) 

(d) Heat is conserved. 


Expressed in equations, the above assumptions be- 
come (a) a flow w= pdxdydz — pudydzdt — 


ôT 
pv dz dx dt — pw dx dy dt, (b) p = cT, (c) pu = ke 
ôT ôT 
pu = —k —, pw = —k —, and (d) dw = 0. Putting 
oy OZ 
them together gives 


oT ôT oT 
d|cT dx dy de + k x dy dz dt + k gy dxdt + co ae dy | = 0 


or ð or ð or ð or 
a ax (Ean) + ay (Fas) + ae (Fae) 
which is the heat equation. If k is constant (the solid is 
re oT 
homogeneous), then the heat equation is simply c — = 
aT oT T 
kV°T, where VT denotes a2 + a + a2 (the La- 
placian of T). In particular, if a homogeneous solid is in 
oT 
thermal equilibrium (= = ), then the temperature 


T(x, y, Z) is a harmonic function. 


8.8 | Applications to Mathematical Physics 335 


Potential Theory 


Newton’s law of gravity states that two particles attract 
each other with a force which is proportional to the 
product of their masses and inversely proportional to the 
square of the distance between them. Coulomb’s law 
states that two electrified particles repel each other with 
a force which is proportional to the product of their 
electrical charges (hence they attract when the charges 
have opposite signs) and inversely proportional to the 
square of the distance between them. 

From a mathematical standpoint these two laws are 
the same and can be expressed as follows: The law 
‘work = force X displacement’ says that the compo- 
nents of force are the components of a 1-form called 
work. In terms of work the inverse square law of attrac- 
tion states that the work required to displace a particle 
of unit mass (or unit negative charge) in the presence 
of a particle of mass (charge) m at (0, 0, 0) is given by 


the 1-form 
vm dr _ a(™) 
r r 


where r = vx? + y? + z? and where the constant of 
proportionality Y depends on the units used to measure 
mass, work, and distance. 

More generally, if there are N particles of mass 
(charge) mı, Mog, ...,my at the points (xi, Y1, Z1), 
(X2, Y2, Z2) -.-, (Xn, Yn, Zy) then, by the assumption 
that forces add vectorially, the amount of work required 
for displacements of a unit test particle is given by the 
1-form 


o Eom Ea) = i 
t=1 i=1 a 


r? i=1 ři 


where r; = V(x — x)? + (y — y)? + (z — z;)2. If 
there is a continuous distribution of mass, then (1) is 
naturally replaced by 


(1°) work = -a| e 


VG- +O- PFE- 
where p is the density of the distribution of mass and 
where the 1-form work refers to displacements of a unit 
particle near a point (x, y,z) where there is no mass 
(so the integral is well-defined). 


Chapter8 | Applications 


336 


The fundamental property of the inverse square law, 
which distinguishes it from all other ‘possible’ laws of 
attraction that are radially symmetric, is the theorem of 
Newton that the force exerted by a homogeneous ball 
of total mass M ona particle P outside the ball is the same 
as the force exerted on P by a particle of mass M located 
at the center of the ball. To prove this it suffices to con- 
sider the case where P has unit mass, in which case the 
assertion 1s that 


a| YM | 
V(x — ¥)} + (y — F} + (z — Z} 
=al | vp dé dn dé | 


where B is the ball of radius r with center (X, P, Z) and 
where the density of mass in B is the constant p. Since 
df = dg if and only if f = g + const., and since both of 
the functions above approach zero as (x, y, z) moves to 
an infinite distance from B, it follows that the equation 
above holds if and only if 


M iL VE- +0 =F FEW OF 
or, since M = pfs dé dy dé, if and only if the value of the 
function 


ulg, n, H = E )?+ 0-7? +@- 9°77" 


at the center of B is equal to its average value over all of 
B. Thus Newton’s theorem is essentially the statement 
that the function 


l 
VP py? +P 


is a harmonic function on xyz-space, defined at all points 

except (0, 0,0). This can be proved (see Exercise 4 of 

§8.3) by showing that u(x, y, z) is harmonic if and only 
ðu ðu ðu 


ax? Tay? t az T 


l 
r 


if it satisfies Laplace’s equation 


. Lo, . , 
and by showing that — satisfies this equation. 
r 


8.8 | Applications to Mathematical Physics 337 


Another important theorem of Newton is that the 
total force exerted by a homogeneous spherical shell on a 
particle in its interior is zero. This is proved by using (1’) 
to show that the work required to displace a unit particle 
in the interior is —du, where 


u(x, y, Z) 
_ | — p didn dg l 
Peeter Vx — E + O an) + zop) 


For each fixed (&, 7, ¢) the function under the integral 
sign is a harmonic function of (x, y, z) for x? + y? + 
z? < r°. Interchanging the order of integration it follows 
that the average of u(x, y, z) over any ballin x? + y? + 
z? < r? is its value at the center of the ball, i.e., u(x, y, z) 
is harmonic. By symmetry u is constant on spheres 
x? + y? + z? = const. < rĉ. Since its average on any 
sphere is u(0, 0, 0) it follows that u = u(0, 0, 0); hence 
du = 0 as was to be shown. 

The inverse square law determines the forces (work) 
given the masses. It is of obvious interest to be able to 
invert this relation, that is, to find the masses given the 
forces. The solution of this problem follows from the 
simple observation that the 2-form 


Q? + y? + 2BP 


defined at all points other than (0, 0,0), is closed and 
that its integral over the sphere x? + y? + z? = 1 is 
not zero. The actual value of its integral over the sphere 
is 4r, but the only important fact in what follows is that 
it is not zero. Thus for any volume V (a compact, differ- 
entiable, three-dimensional manifold-with-boundary, 
oriented by dx dy dz), 


4r if (0, 0, 0) is inside V 


f 0 if (0, 0, 0) is not in V 
Q = 
ov not defined if (0, 0, 0) is on dV. 


Now if the 1-form work is 


Adx+ Bdy + Cdz = -a (2) 


xdx + ydy + zdz 


KETE 


Chapter8 | Applications 


338 


so that 


it then follows that 
A dy dz + Bdzdx + Cdxdy = Ymw 


and hence that 


REZE B dz dx + C dx dy 


Q if the mass is not in V 
= { 4rYm if the mass is inside V 
not defined if the mass is on OV. 


Thus the amount of mass and its location can be deter- 
mined when A, B, C are known. More generally, if 


N 
Ads + Bdy + Cdz = -a( X ™) 


i=1 ři 
then 


L | Advds + Bdzdx + Cdxdy 
Ar’ Joav 


is defined provided that none of the masses lie on dV 
and is equal to the total mass of the particles in V. This 
determines the distribution of masses when work = 
A dx + Bdy + C dz is known (and when the constant 
Y is known). 

This problem is simpler when the masses are continu- 
ously distributed, since then the 1-form 


work = Adx + Bdy + Cdz 


is defined even at points where mass is present. This fact, 
which will not be proved rigorously, can be seen as a 
result of the fact that a ball exerts no force on its center 
so that when mass is continuously distributed the 
(nearly constant) mass near the point cancels out and 


. 1, 
only masses away from the point, where — is defined, 
r 


need be counted. In short, the integral 


YPC, n, t) dé dn dọ 


M22) = |G a PE en DI 


8.8 | Applications to Mathematical Physics 339 


converges in a generalized sense (see §9.7), even though 
the integrand is not defined at (¢, n, ¢) = (x, y, z) and 
even though it is unbounded near this point. Then 


Adx + Bdy+ Cdz = —du 


is a well-defined 1-form for all (x, y, z) which, by the 
inverse square law, is the work required for displace- 
ments of a unit test particle. By the above, 


i. | Adydz+ Bdzdx + Cdxdy 
TY Jav 


total mass inside V 


= | p(x, y, Z) dx dy dz. 
y 
By Stokes’ Theorem it follows that 
p(x, y, z) dx dy dz 


— alat (A dy dz + B dz dx + Cae dy)|: 
4rY 


Hence 
1 [0A , 0B, ac 
p(x, y, z) = ps [24 + 22 + 2] 
or, noting that A dx + Bdy + Cdz = —du, 
(2) ou TH TU dryp = 0 
ax2 | ay? mp 


The function u is called the potential function determined 
by the mass distribution p, and the equation (2) is called 
Poisson’s equation. 

The subject of potential theory is essentially the study 
of the inverse square law from the point of view of 
Poisson’s equation (2). Given the potential function u, 
the 1-form —du describes the work required to perform 
displacements of a unit test particle, or, as it is more 
commonly stated, the 1-form du describes the work 
done by the masses p during the displacement of the test 
particle. At points of space where there is no mass, 
Poisson’s equation says that the potential function is 
harmonic. If the only mass is at (0, 0, 0), then u must be 
harmonic except for a singularity at (0, 0,0); hence, by 


const. 
radial symmetry, u must be of the form —— + const. 
r 


Chapter8 | Applications 340 


*’That gravity should be innate, 
inherent, and essential to matter, So 
that one body may act upon another 
at a distance through a vacuum, 
without the mediation of anything 
else, by and through which their 
action and force may be conveyed 
from one to another, is to me so great 
an absurdity, that I believe no man 
who has in philosophical matters a 
competent faculty of thinking can 
ever fall into it.” (Newton's third 
letter to Bentley.) 


(see Exercise 1). The condition that u ‘vanish at 00’ gives 
u = const./r. Rewriting (2) as 


Ou ðu ðu 
—a| % dy az + jy EA + 2H ae dy 
= 4rVp(x, y, z) dx dy dz 


and integrating over the ball x? + y? +z? = 1, it 
follows that the constant is Ym, where m is the mass. 
Therefore 


Ym 
u=—->? 
r 


which shows that the inverse square law follows from 
Poisson’s equation and the assumption that u(oc) = 0. 
In general this argument shows that if u(oc) = 0 and if 
Poisson’s equation is satisfied, then u can be determined 
from p by 


Yp(E, n, ¢) dE dn dẹ 


(3) u(x, Js z) = —————— — ? 


which is the inverse square law. Thus the equations (2) 
and (3) are merely different ways of saying the same 
thing. 

Historically the Newtonian formulation (1), (3) of 
course came first, and when one is dealing with a finite 
number of particles or balls (which behave like particles), 
this is the simpler formulation to apply. However, it 
involves the concept of ‘action at a distance’ between 
two bodies, which even to Newton was ‘an absurdity’.* 
The formulation (2) reverses the roles and assigns the 
‘reality’ to the potential u rather than to the masses p 
which are derived from u locally by (2). In his formula- 
tion of electrostatics Maxwell preferred the Poisson 
formulation both because it eliminated ‘action at a 
distance’ and because the exact nature of ‘electrical 
charge’ was even more obscure than that of ‘mass’, so 
that ‘potential’ or ‘work’ was a more satisfactory basis 
for the theory. 


Maxwell’s Equations 


The statement of Coulomb’s law assumes that the 
medium through which the forces act is homogeneous 
and isotropic; that is, the force between two bodies 


88 | Applications to Mathematical Physics 341 


depends only on the distance which separates them and 


, . . l 
not on their particular locations. The constant e = Ta 
T 


is called the dielectric constant of the medium. In terms 
of the dielectric constant Poisson’s equation takes the 
form 


(4) V’p+p=0 


where p is the charge density and where the letter denot- 
ing ‘potential’ has been changed from u to ¢ to accord 


2 
with standard usage. V7é means, as before, o¢ + 
a 026 Ox 

oy? dz? 

The 1-form — dọ is called the ‘electric force field’ and 
is denoted by E; it describes the work contributed by 
the force field toward displacements of a unit charged 
particle. The 2-form 


ap a6 X 
-e (3 dy dz + ay dzdx + 37 dx dy) 


is called the ‘electric displacement’ and is denoted by D. 
With these definitions Poisson’s equation (4) is broken 
into three steps 


E = —do 
D = éE 
p dx dy dz = dD 


where the second ‘equation’ between a l-form and a 
2-form means that the 2-form D is defined in terms of 
the 1-form 


E = E, dx + E> dy + E} dz 
by the equation 
D = eE, dy dz + €E dz dx + €E; dx dy. 


These are the equations of electrostatics (no moving 
charge). 

A moving charged particle is acted on by and exerts 
forces other than the forces described by Coulomb’s law, 
namely magnetic forces. The basic facts concerning mag- 
netic forces were discovered by Faraday, who found 


Chapter8 | Applications 342 


*The physical reality of the lines of 
force can be seen in the way that 
iron filings distribute themselves in 
the presence of a magnetic field. 


that magnetic forces can be described by a closed 2-form 
B= Bı dy dz + Bə dz dx + B; dx dy 


and their relation to electrical forces can be described by 
the equation 


d(E dt + B) = 0. 


Needless to say, Faraday did not state his laws in this 
form; his description, which was deliberately physical 
and not mathematical, was as follows: 

The ‘magnetic field’ can be visualized as consisting of 
oriented curves in space called the ‘lines of force’.* The 
lines of force do not terminate; that is, the number of 
lines of force which enter any small three-dimensional 
region V of space is equal to the number which leave V. 
If S is a surface (compact, oriented, differentiable sur- 
face-with-boundary) in space and if the magnetic field 
changes over a time interval {7 < t < t + At}, then an 
electromotive force acts around the curve 0S. The 
total force around ðS is proportional to the change 
during the time interval in the number of lines of force 
which cross S (Faraday’s Law of Induction). (Like any 
force, this force around 0S does not mean that anything 
necessarily moves; it means that if there were a circuit 
on 0S, then one would observe a certain amount of work 
being done by the changing lines of force in moving 
charge around the circuit.) Since the lines of force do 
not end, the change in the number which cross S must 
be equal to the number which ‘cut’ across ôS during the 
time interval, so an equivalent statement of Faraday’s 
law of induction is that the total electromotive force 
around a closed curve is proportional to the number of 
lines of force which cut the curve during the time interval. 

Mathematically, the quantity ‘number of lines of 
force which cross the surface S at time £? is naturally 
represented as the integral of a closed 2-form fs B 
depending on ¢, and Faraday’s law is 


t+At 
| ( | r) dt = const. X change in | B. 
i as S 


When the unit of E is chosen (E = —d¢ so the unit of 
E is potential and the units of E}, E2, E3 are potential/ 
length), the unit of B can be chosen to make the constant 


8.8 | Applications to Mathematical Physics 


343 


of proportionality equal to —1 so that the equation 


becomes 
ðB 
2B) dt 


[Ude U 
MORDE 


| d{E dt + B] = 0. 
IXS 


Here 7 X S is the 3-manifold {(x, y, z, t) : (x, y, Z) is in 
S and į < t < t + At}, and the fact that dB = 0 for 
each fixed ¢ has been used. Thus the integral of 
d(E dt + B) over any 3-rectangle in any of the 4 coor- 
dinate directions is O (in the xyz-direction it is zero be- 
cause dB = 0), which shows that Faraday’s law is indeed 
equivalent to 


d(E dt + B) = 0 


when the units of B are properly chosen. (In terms of 
the components of E and B this equation is 


d[E; dx dt + E,dydt + E; dz dt + B, dy dz + Bə dz dx + B3 dx dy] = 0 


(22: 4 282 4 28) dy dy dz + (Be Ee 4 Bi) ay det 
dE, ðE; , Bs GE, 9F, , OB: = 
+(% a B2) dc dea + (2E — E + Ps) dx dy dt = 0 


*One cannot write ji = pu, j2 = pV, 
/3 = pw because p (the net charge 
density) may be zero even when the 
current 1s not; this occurs, for 
example, when there is a stationary 
positive charge and a moving 
negative charge. 


or, as it is usually stated in physics books, div B = 0, 
OB 
curl E + van = o.) 


The closed 2-form Edt + B on xyzt-space is called 
the electromagnetic field. The presence of E is related to 
the presence of charges, and the presence of B is related 
to the presence of moving charges. Altogether the charge 
and its motion are described by a 3-form 


J = p dx dy dz — jı dy dz dt — jə dz dxdt — j3 dx dy dt 


where p is the charge density and (j1, J2, j3) are the com- 
ponents of the current* giving the amount of charge 
crossing surfaces in xyz-space per unit area per unit 
time. The assumption that charge is conserved is the 
assumption that dJ = 0. The presence of the field 
Edt + Bis related to the presence of the flow of charge J; 


Chapter 8 | Applications 


344 


Maxwell’s equations give this relationship in explicit 
form. 

When the charge is stationary (ji = j2 = j3 = 0), 
there is no magnetic force (B = 0), and Eis related to p 
by the equations D = «E, dD = pdx dy dz as above. 
The desired relation (Maxwell’s equations) between 
Edt + Band J must be a generalization of this relation 
to the case in which J has non-zero terms in dt. 

The explicit relation between moving charge and B is 
found by placing a wire on the z-axis and running current 
along the wire. This creates a magnetic field in the vicinity 
of the wire, as was discovered by Oersted. When this 
field is measured, its magnitude is found to be propor- 
tional to the current, and its lines of force are found to be 
circles in planes z = const. whose centers are on the 
wire and whose density is inversely proportional to the 
distance from the wire. In short, the field is 


dr dz 


B = const. X current X 


where r = y x? + y?. Or, denoting the constant of pro- 
portionality by a, 


B = aj; dr dz _ aj AE FY dy dz, 


x2 + y2 
If B = B, dx dz + Bə dy dz is known, then the rate of 


flow of charge across any small surface S can be found 
by the equation 


| (Bə dx — B, dy) 
as 
=, [ydx — xdy 
= & J3 x2 + yp 
[2ra jz if 0S goes around the z axis 


0 otherwise. 


Therefore in all cases 


l 
-3L | ads — B dp) 


= rate at which charge crosses S. 


The constant a@ is in fact negative (the orientation of B 
having been chosen above), so that the constant 
u = —2ra is positive. Since units of charge, length, and 
time, and hence of potential and of magnetic force have 


8.8 | Applications to Mathematical Physics 345 


been chosen, y is a well-defined quantity; it is called the 
‘magnetic permeability’ of the medium. 

If currents flow along any collection of wires parallel to 
the coordinate axes and if the resulting magnetic field 
B = B,dydz+ B,.dzdx + B3dxdy is known, then 
the currents can be found: 


Lf (Bı dx + Bə dy + Bs; dz) 
H Jas 
= rate of flow of charge across S. 


Assuming the charge flow to be continuously distributed 
this can be stated 


Jau- [ H= | Gidydz+ jadzdx+ ja dxd) 
S ðS S 
where 
l l l 
H = — B, dx + —- B; dy + — B; dz. 
M H H 


The relation between B and H will be abbreviated 
B = uH (analogous to the abbreviated equation 
D = eE). Thus 


dH = jı dy dz + j dz dx + j; dx dy. 


In cases in which this equation holds (currents on wires) 
the charge density p and the electrical field E are con- 
stant in time, so that this equation can be combined 
with dD = p dx dy dz to give 


a[D — H dt] = p dx dy dz — jı dy dz dt — jə dz dxdt — jz dx dy dt. 


Maxwell concluded that this relation must hold even 
when p is not constant in time, that is, that 


d(D — H dù = J 


gives the desired relation between the motion of charge 
and the electromagnetic field. 
In summary, the equations are 


field = Edt + B 
moving charge = J 
d(Edt + B) = 0, dJ = 0 
D = eF, B = uH 
d(D — Hdt) = J. 


Chapter8 | Applications 346 


These are known collectively as Maxwell’s equations, 
although only the last equation is actually Maxwell’s 
discovery. (In terms of the components E,, Eo, E3, 
B,, Bə, B3 of the field this equation is 


d| eE, dy de + «Ep de dx + «E dx dy — L By dx dt — | By dy di — L B dz di! 
= pdx dy dz — jı dy dz dt — jə dz dxdt — j; dx dy dt, 


which gives 


(i dE» “Es 


‘Vax t ay + az) 
1 (By _ Ba _ aE) _ 
u \ op OZ K ot) Ji 
1 (Bı _ Bs _ 2E) _ 
u \ OZ Ox K ot } J2 
1 (dB, _ ôB: ðE \ _ 
u \ dx ðy a) 7 
. . . ðE 
which can be abbreviated e div E = p, curl B — eu — = 
| | aD or 
uj or divD=p, curl H — oP = j, where j= 
(Jis j2, j3).) 
*Actually Maxwell's formulation was Maxwell* showed that the equations of electromag- 


somewhat different. The formulation  netism can be formulated in a compact way in terms of a 
given here is due to Lorentz. . . 
‘potential’ so that they are analogous to Poisson’s equa- 
tion €V? + p = 0 for electrostatics and reduce to this 
equation in cases in which there is no current. Since 
d(Edt + B)= 0 there is a l-form A = A,dx + 
+By Poincaré’s Lemma (88.6). Agdy + Azdz + A,dt such that —dA = Edt + B+ 
(There are many such 1-forms, for example, A + df, 
where f is any function.) Writing the components of 
Edt + B in terms of the derivatives of the components 
of A and substituting into the equation d(D — H dt) = J 
gives 


074, . 0°A, . 0°Ag 9 [Ay  ðÁz , OAS 
—e (Sat + Sit + St — 8 ax t ay T az /J OU 


1 /07A, . 0°A; 07A, ð [0A . OA3 0A, 
-1(4 T 37 — ae — dx ay | az Hot dy dz dt 


8.8 | Applications to Mathematical Physics 347 


If the components of A are assumed to satisfy the addi- 
tional condition 


04, , 0An , OA aA 
(5) + 4+ 3-452 - w= = 0, 


this equation is simply 
eO?A, dx dy dz + ; 2A, dy dz dt + ; Ao dz dx dt + ; 2A, dx dy dt + J = 0, 


where O? denotes the ‘differential operator’ assigning to 
a function f(x, y, z, t) the function 
Of oF. 


+ >=5- € 


af af 
2 = —— 
Of = + ðz2 H Ot? 


x2 ðy2 
This differential operator is called the D’Alembertian. 
Defining ® to be the 3-form 


p = A4 dx dy dz + © Ay dy dz dt + =- A dz dx dt + <- As dx dy di 


the assumption (5) on the coefficients of A is simply the 
assumption 


(5’) dọ = 0 
and the equations reduce to 
(4’) eO*S+J=0 


where the D’Alembertian O? is applied to a 3-form by 
applying it to each component separately. 

A 3-form ® is called a vector potential for the flow of 
charge J if the equations (4’), (5’) are satisfied. When this 
is the case an electromagnetic field Edt + B whose 
charge flow is J can be obtained by setting 


® = odxdydz — yı dy dz dt — yə dz dxdt — ¥3dxdydt 


and 
Edt + B= —d(¢ dt — cup; dx — euy dy — euy; dz). 


Conversely, it can be shown that any electromagnetic 
field Edt + B can be derived from a vector potential ® 
in this way. 

The equation dẹ = 0 means that © can be interpreted 
as a flow of a quantity called ‘potential’ which is pre- 


Chapter 8 | Applications 


348 


served; ¢ is the density of this quantity in space and 
Yı, W2, Y3 give the rate (per unit area per unit time) at 
which it crosses surfaces in space. When the potential is 
Stationary (Y1 = Yə = y3 = 0), the equation db = 0 
implies @ is independent of t and the equation e)7@ + 
dp 8p ə? 
J = 0O reduces to e€ (= 3 + 2s) dx dy dz + 
p dx dy dz = Q, that is, reduces to Poisson’s equation (4). 
When J = O (no net charge and no net motion of 
charge) the equation eð ° + J = 0 reduces to 


O76 = 0, O07, = 0, O72 = 0, O73 = 0. 


A typical solution f of the equation 


Pf f af af 
ax? T a2 T az? Hap TO 


gz2 SM aye 
(that is, O?f = 0) is a function of the form 


f(x,y,z, t) = A sin(B(aıx + ay + a3z — ct) + C) 


l 
where a? + aj + a? = 1, c? = —, and A, B, C are 
m 


arbitrary constants. This function can be described as a 
wave moving in the direction (a1, a2, a3) with velocity c. 
For this reason the equation O ?f = 0 is called the ‘wave 
equation’ and the number 


o 
V eu 


is called the ‘wave velocity’. A solution ® of «076 = 0 
can be thought of as being made up of many such 
“waves moving with velocity (eu)~*/?”’ which would 
lead one to expect that, when J = 0, electromagnetic 
disturbances propagate with the velocity (eu)~'/?. The 


C 


constants e, u can be measured by electromagnetic experi- 


ments and the wave velocity (eu)~ !/? determined. Max- 
well determined these velocities for various media and 
found that they agreed with the velocity of light in these 
media (to within the limits of experimental error). He 
concluded that /ight is an electromagnetic phenomenon, 
one of the most important discoveries in the history of 
physics. More generally, an ‘electromagnetic wave’ is an 
electromagnetic field for which J = 0. The possibility 
of generating electromagnetic waves by electrical means 
led to the invention of radio. 


8.8 | Applications to Mathematical Physics 349 


Cc 


Lorentz Transformations 


The formulation of Maxwell’s equations as dé = 0, 
e076 + J = 0 makes it easy to find the manner in 
which they transform under an affine change of coor- 
dinates 


= A11X + Ayoy + 4132 + a14t + bı 
A21X + Azg2y + A237 + a24t + b2 
A31X + Agoy + 4332 + a34t + b3 
= A41X + Agoy + Ag3z + a44t + b4. 


/ 
/ 
1 


X 
O 
Í 


~ 


Expressing the 3-forms ®, J in terms of (x’, y’, z’, t’) the 
equation dẹ = 0 is unchanged (i.e., the statement that 
a 3-form is closed has intrinsic meaning independent of 
the coordinates), and the equation e0 °® + J = 0 be- 
comes another equation of the same form in which O? 
l l ð? ð? 8? ð? 
is the differential operator 7 + ay + ape Eu ap 
expressed in terms of (x’, y’,z’,t’). This differential 
operator can be found explicitly by noting that 


Of ox’ af oy’ ð ð 
ie ax F ax Tete = an gg + a s t ete 
Hence 
2 
= aii z T 411421 3x'ay Forts + 21411 ayax Fee 
All together 
of a°f a f 
2 . |, 
o'f = C11 gga T O12 garay T ree E Caa 5 


where there are 16 terms and where the coefficients c;; 
are the coefficients of the 4 X 4 symmetric matrix 


C = AMA? 
where 
1 00 0 
0 10 0 
M = 0 0 1 0 
0 0 0 —eu 


Q11 aı2 413 G14 


Q31 Q@32 433 a434 
Q41 442 43 G44 


Chapter8 | Applications 350 


. ax 
*Note that vis ot when x’ = O; that 


is, it is the velocity of the point 

(x’, y’, 2’) = (0, O, O) in xyz 
coordinates. Note also that eu is a 
very small number so that Y is nearly 
7 and the transformation Is nearly 
X=x-wyY=y2z2=2,t=t 
unless v is very large. 


tMore precisely, the Lorentz 
transformations obtained in this way 
are proper Lorentz transformations. 
They do not include reflections in a 
plane of xyz-space or time reversal. 


tProvided the change of coordinates 
(6) preserves the orientation, i.e. 
provided det A > 0. 


Thus the expression of Maxwell’s equations dẹ = 0, 
eO’ + J = 0 in terms of (x’, y’, z’, t’) is found essen- 
tially by computing the matrix AMA’. 

A transformation of coordinates (6) is called a 
Lorentz transformation if Maxwell’s equations d® = 0, 
eD’ + J = 0 have the same form with respect to 
(x’, y’, 2’, t') as they do with respect to (x, y, z, t); that 
is, a Lorentz transformation is an affine transformation 
of coordinates (6) such that AMA‘ = M. Typical 
Lorentz transformations are translations (x’ = x + 
const., y’ = y + const., z’ = z + const, t = t+ 
const.); rotations of xyz-space (x’, y’,z’ a rotation of 
x, y, z while t’ = t); and the transformation 


x’ = V(x — vt) 


fo 

(7) yey y= —L_ 
zZ =z V1 — ew? 
t = Y(—euvx + t) 


where v is any constant* less than the speed of light—i.e., 
euv? < 1—and where Y is obtained from v as indicated. 
It is not difficult to show that everyt Lorentz transforma- 
tion is a composition of these three types: translations, 
rotations of xyz-space, and the transformation (7). 

Not only are the equations d = 0, eO°@+ J= 0 
unchanged by a Lorentz transformation, but all of the 
equations 


Edt + B = —d(¢dt — euẹpı dx — eudody — eny3 dz) 
D = éE, B = pH, dJ = 0 
d(D — Hdt) = J 


are unchanged ł (see Exercise 3). Thus if the field Edt + B 
is derived from ® in xyzt-coordinates and then trans- 
formed to x’y’z’t’-coordinates, the result is the same field 
(2-form in (x’, y’, 2’, t’)) as is obtained by first converting 
® to x’y’z’t'-coordinates and then deriving the field by 
the first equation above. Similarly, the 2-form D — H dt 
can be obtained from the 2-form Edt + B by the rules 
D = eE, B = uH either before or after performing a 
(proper) Lorentz transformation of the coordinates and 
the result is the same. 

The fact that the laws of electromagnetism are un- 
changed by a Lorentz transformation of the coordinates 
is very useful in applications because it allows one to 
choose new coordinates in which the problem at hand is 


8.8 | Applications to Mathematical Physics 351 


simpler. For example, to find the electromagnetic field 
generated by a moving magnet, one can first perform a 
Lorentz transformation of coordinates such that the 
magnet is stationary with respect to the new coordinates, 
find the field (now purely magnetic) which it generates 
in these coordinates, and return to the original coor- 
dinates to find the desired field (which is partly electrical). 


Special Relativity 


The fundamental postulate of Einstein’s theory of special 
relativity is simply this: AlI laws of physics should, like 
Maxwell’s laws of electrodynamics, be unchanged by 
Lorentz transformations of the coordinates. The motiva- 
tion of this postulate is, briefly, as follows: 

It is a fundamental postulate of Newtonian physics 
that the notion of velocity has no intrinsic physical 
meaning; that is, a body in uniform motion in a straight 
line cannot be distinguished from a body at rest. For 
example, it is not meaningful to say that ‘the sun is 
Stationary’ but only that ‘the motion of the sun is 
unaccelerated’. Since the velocity of light enters into the 
formulation of Maxwell’s laws of electrodynamics in an 
essential way, Maxwell’s laws are compatible with New- 
tonian physics only if there is a notion of ‘rest’ relative 
to which the velocity of light is defined. If there is such 
a physically meaningful notion of rest, then the applica- 
tion of Maxwell’s laws to an actual physical system 
requires that one first determine its motion relative to 
absolute rest. This adaptation of the laws of electro- 
dynamics to Newton’s postulates is unsatisfactory both 
from a philosophical standpoint—because the notion of 
‘absolute rest’ is contrary to the spirit of the Newtonian 
postulates which it is trying to salvage—and from a 
practical standpoint—because no ‘absolute motion’ of 
the earth has been detected experimentally (the Michel- 
son-Morley experiment), even though the motion of the 
earth relative to the sun varies greatly during the course 
of the year. In this context, Einstein’s postulate can be 
regarded as the postulate that the laws of electrody- 
namics take precedence over the Newtonian postulates, 
and that the latter, not the former, need to be revised. 


Mass and Energy 


The revision of Newtonian mechanics in accordance 
with the postulate of special relativity is not at all simple. 


Chapter8 | Applications 


(8) 


352 


For example, in seeking a ‘relativistic’ version of the 
fundamental law 


force = mass X acceleration 


one must make a fundamental change in one’s conception 
of ‘mass’. Einstein asserts, in fact, that “mass and energy 
are essentially alike”, even though the original idea of 
mass was inertia, which is virtually the opposite of energy. 
The argument by which Einstein arrived at this amazing 
conclusion was roughly as follows: 

Assuming that matter consists of charged particles and 
assuming that all forces are electromagnetic, the elec- 
tricity of the system is described by a closed 3-form ® 
(vector potential) from which the electromagnetic field 
and the distribution of moving charge J = —e0)*& can 
be derived. 

The rule 


@ dx dy dz — yı dy dzdt — yə dz dxdt — 3 dx dy dt 
«> pdt — eu, dx — eupo dy — ems dz 


used in deriving the electromagnetic field from ® is, as 
was stated above, ‘Lorentz invariant’; that is, the 3-form 
determines the 1-form, and after a (proper) Lorentz 
transformation of coordinates the same 3-form deter- 
mines the same 1-form. Similarly, the 1-form 


pdt — eujı dx — eujody — eujs dz 


determined by J is unchanged by a (proper) Lorentz 
transformation. Multiplying by the 2-form D — H dt 
gives the 3-form 


—e°u( jE} + Joks + j3E3) dx dy dz + e(pE: + j2B3 — j3Bo) dy dz dt 


+ e(pE2 + j3B, — j1B3) dz dx dt + e(pE3 + jıB2 — j2B1) dx dy dt. 


This 3-form is determined by © (because D — H dt and 
J are), and is unchanged by a (proper) Lorentz transfor- 
mation of coordinates. When there is no net moving 
charge (j = 0), the first term in the 3-form (8) is zero 
and the coefficients of the remaining three terms are 
epE; (i = 1, 2,3). By Coulomb’s law these are e times 
the internal forces of the system, i.e., e times the force 
exerted by the field E on the charge p. Now for any point 
(X, y, Z, t) there is a Lorentz transformation of coor- 
dinates in which j = 0 at (X, F, Z, t) (let the origin of the 
coordinates move with the charge), so that the 3-form 


8.8 | Applications to Mathematical Physics 353 


(8) divided by e is ‘force’. But, by the postulate of special 
relativity, if force is to have any physical meaning it 
must be unchanged by Lorentz transformations. Since 
e`! times the 3-form (8) is force relative to one coordi- 
nate system and is unchanged by Lorentz transforma- 
tions, the only possible definition of force which is 
consistent with the theory of relativity is e7! times 
(8), that is 


force = —eu(jiF, + j2E2 + j3E3) dx dy dz 
+ (pE; + j2B3 — j3B2) dy dz dt 
+ (pE2 + j3Bi1 — j1B3) dz dx dt 
+ (pE3 + jıB2 + j2Bi) dx dy dt. 


The last three components of this expression for the force 
were well-known before Einstein; they give, in addition 
to the electrical force pE, the magnetic force exerted by 
the magnetic field B on the moving charge j. It was 
Einstein who showed that his ‘principle of relativity’ 
implied the existence of the remaining component 


—eu( iE, + joke + j3E3) dx dy dz. 


The quantity 7,£, + j2E2 + j3E3 is the ‘rate of work’ 
being performed by the field E, dx + Eə dy + E3 dz on 
the net charge moving at the rate 7. Hence the new com- 
ponent of force is —eu times the rate of change (per 
time) of the internal energy of the system (per volume). 
This gives the relativistic meaning of force. In finding 
the relativistic meaning of F = ma Einstein first wrote 


d , , 
F = ht (mv) and integrated to obtain 


| force - dt = change in momentum 


where ‘momentum’ is mass times velocity. Now it is 
natural to describe the flow of mass by a 3-form 


m dx dy dz — mu dy dz dt — mv dz dx dt — mw dx dy dt 


where m is the density of mass and (u, v, w) are the com- 
ponents of its velocity. The last three components are the 
negative of momentum which leads to the conclusion 
that, just as force has a fourth component which was 
omitted from classical physics, so also momentum has 


Chapter8 | Applications 354 


a fourth component and the complete expression 
should be 


Momentum = —mdx dy dz + mu dy dz dt + mv dz dx dt + mw dx dy dt. 


Exercises 


Newton’s law F = ma is the equation 


| force dt = change in momentum 


applied to the last three components of force and mo- 
mentum. Applying the same equation to the newly found 
dx dy dz components gives 


— eu | £ (internal energy) dt = change in (— m) 


or, letting E denote the internal energy and eu = 1/c?, 
where c is the speed of light 


change in E = change in mc’. 


This shows that energy and mass are ‘essentially alike’. 
Of course the internal energy E of the system is deter- 
mined only up to an additive constant, but if it is as- 
sumed that E can actually be reduced to zero and that 
the mass is then zero it follows that 


E = mc’. 


1 Show that if a function u(x, y, z) defined for {x? + y? + 
z? > R*}\ is harmonic and radially symmetric, then it is of 
the form 


A 
u(x, y, Z) = 7t B 


where A, B are constants and r = v x2 + y2 + z2. [The 
2-form ou dy dz + Ou dz dx + Ou dx dy is closed. By sub- 
Ox Oy OZ 


tracting a suitable multiple of r—! from u it can be assumed 
that its integral over any large sphere is zero. Thus it suffices 
to show that if u(r) = u(x, y,z) is a radially symmetric 


harmonic function such that the integral of dy dz + 
x 


Ou ð 
ay dz dx + >. dx dy over large spheres is zero, then u = 


8.8 | Applications to Mathematical Physics 355 


const., i.e. du = 0. But du = w'(r) dr and u'(r) is constant on 
spheres. Hence the 2-form above is a multiple f(r) of the 
2-form x dy dz + ydzdx + zdx dy. Applying Stokes’ Theo- 
rem gives f(r) = 0. Hence u’(r) = 0 as desired.] 


2 Find the electromagnetic field generated by a single 
particle of charge e moving at uniform velocity v (assume 
v < c) along the x-axis, passing through (0, 0,0) at ż = 0. 
[Take a Lorentz transformation of coordinates in which the 
particle is stationary, find the field in the new coordinates by 
Coulomb’s law, and convert it to the original coordinates. ] 


3 Prove that all of Maxwell’s laws are unchanged by a 
Lorentz transformation with positive determinant as follows: 


(a) Show that it suffices to prove that the rules 


o dx dy dz — Yı dy dz dt — fo dz dx dt — ¥3 dx dy dt 
<> ġ dt — eup dx — eubody — eut3 dz 


establishing a correspondence between 3-forms and 
1-forms and 


Fy dxdt + Eə dy dt + E dz dt + Bı dy dz + Bə dz dx + B3 dx dy 
1 
oF (euE dy dz + euE dz dx + ewE3 dx dy — Bı dx dt — Body dt — B3 dz dt) 


establishing a correspondence between 2-forms and 
2-forms are unchanged by a Lorentz transformation 
with positive determinant, i.e. ‘transform and cor- 
respond’ is the same as ‘correspond and transform.’ 


(b) Prove that this is true for the particular Lorentz 
transformation (7) of the text. 


(c) Prove that this is true of the particular Lorentz 
transformation 


x’ = x cosl + ysiné 


y! = —x sin l + y cos 0 
zZz =z 
t =t. 


(d) Prove that this is true of the Lorentz transformation 


xX = =x, y'= y, Z =2,t' =t. 

(e) It is true that every Lorentz transformation with posi- 
tive determinant is a composition of rotations around 
coordinate axes in xyz-space, the transformation (7), 
translations, and the transformation of (d). Hence by 
appeal to this fact, (a), (b), (c), (d) suffice to prove the 
theorem. To prove it without appeal to this fact re- 
quires more linear algebra than is contained in this 
book. Readers who have some background in linear 


Chapter8 | Applications 356 


algebra can prove the needed facts stated in (a) as 
follows: The rule 


(A; dx + Ag dy + A3 dz + Ag dt): (Bi dx + Body + B3 dz + Bs dt) 
= A,B, + AoBo + A3B3 — cu A4B4 


is a Symmetric bilinear form from pairs of 1-forms to 
functions. It is unchanged by Lorentz transforma- 
tions. Thus every 1-form determines a map 
{1-forms} — {functions} in a Lorentz invariant way. 
A 3-form also determines a map {1-forms} —> 
{functions} by the rule 


W1W3 


— § 
dx dy dz dt 


GW] 


where w3 is the 3-form. This is unchanged by Lorentz 
transformations with positive determinant. The cor- 
respondence between 1-forms and 3-forms given in 
(a) is the correspondence between forms which give 
the same map {1-forms! — {functions} and is there- 
fore invariant. There is also a Lorentz invariant bi- 
linear form 


{2-forms} X {2-forms} — {functions} 


which can be used to prove that the correspondence 
of 2-forms is invariant. 


further study 
of limits 


chapter 9 


9.1 


The Real Number System The fundamental operations of calculus—differentiation 
and integration—involve limits of real numbers. In order 
to give a rigorous definition of these operations one must 
therefore have a precise formulation of the concept of 
real number. This in turn requires a re-examination of the 
concept of number itself. 


Natural Numbers 


The most primitive notion of number occurs in the 
process of counting ‘one, two, three, ...’, which is the 
context in which one first learns these words for number. 
The numbers used in counting, that is, the positive whole 
numbers, are called natural numbers. These numbers are 
natural to us because we are in possession of such an 
efficient system of representing them and of performing the 
operations of arithmetic on them. 

Numbers have been represented in a great variety of 
ways in various civilizations. Vestiges of ancient systems 
of numeration are encountered in the use of Roman 
numerals to denote years (used because MCMLXIX is, 
by its lack of clarity, so much more imposing than 1969), 
in the use of 23° 2’ 18” to denote an angle (from the 


357 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhauser Classics, 
DOI 10.1007/978-0-8176-8412-9_9, © Harold M. Edwards 2014 


Chapter9 | Further Study of Limits 


214739 
214379 


3<7 
. 214379 < 214739 


12 3 4 
5 6 7 8 
9 10 11 12 
3X 4=12 


358 


Greek practice of using a prime to denote the ‘place’ of 
a number, necessitated by the lack of a symbol 0 to fill 
empty ‘places’), and in the primitive but still useful 
method of tallying +t +r W Ill. These systems have 
now been replaced by the decimal notation based on the 
Arabic (originally Hindu) symbols 0, 1, 2, 3, 4, 5, 6, 7, 8, 
9. The advantages of this system over all previous ones 
are so great that it has been incorporated into all the 
major languages of the world. 

In decimal notation a natural number is represented by 
a symbol of the form a,a,_1...€@2a, consisting of a 
finite sequence of symbols a;, each of which is one of the 
ten digits 0, 1, 2, . . . , 9. (Commas, or, in many languages 
periods, are often inserted dividing the digits into threes 
to increase legibility, e.g. 3,427,182. In theory there is no 
limit to the number of digits n although in practice n 
rarely exceeds fifteen.) Two such symbols a,a,_1...@ 
and bmbm—ı - - . bı are considered to represent the same 
number if they differ only by the addition or deletion of 
0’s on the left end. For example, 10 means the same thing 
as 0010. For this reason all 0’s on the left end are normally 
deleted so that a, = 0. A symbol a,a,_, ... a, in which 
all digits are 0 is excluded. 

Given two such symbols a = a,da,_;...a, and 
b = bmbm—ı ... bı, one writes a < b if the following 
relation holds: When 0’s are added to the left end in such 
a way that a and b have the same number of digits, a 
precedes b in the lexicographic ordering based on the 
ordering 0< 1<2<.--- <9 of the digits; that is, 
reading from the left, the digit of a precedes the corre- 
sponding digit of b in the first position where they differ. 
Thus 1 is the least symbol, i.e. 1 < b for all b = 1; then 
2 is the next least, etc., giving the usual order 1 < 2 < 
<9 <10< I< -+- < 99 < 100 < 101 <--- for 
the symbols. In this way the symbols serve as ‘counters’ 
and give a concise way of recording the result of any 
count, by doing a parallel count of the symbols. 

Given two symbols a and b, one finds a third symbol, 
denoted a + b and called their sum, by the operation 
‘count first to a, then to b, and record the total count’. 
Given two symbols a and b, one finds a third symbol, 
denoted ab or a X b and called their product, by the 
Operation ‘count to a, repeat b times, and record the 
total count.’ 

The relation a < b and the operations a + b, ab 
clearly have the following properties: If symbols a, b are 


9.1 | The Real Number System 


359 


*3 > b means, of course, that b < a 
in the sense defined above. 


31 
94 
125 


given, then exactly one of the relations a < b, a = b, 
a > b holds.* The relation a < b holds if and only if 
there is a symbol c such that a + c = b. The commuta- 
tive, associative and distributive laws a + b = b + a, 
ab = ba, (a+ b)+ c= a+ (b+ c), (ab)c = a(bc), 
a(b + c) = ab + ac all hold. Terms can be cancelled 
from sums and products; that is, if a + c = b + c then 
a = b, and ifac = bc thana = b. The symbol 1 has the 
property that 1- a = a for all a. 

These properties are in fact properties of the counting 
process and have nothing to do with decimal repre- 
sentations. Rather, they tell how + and X can be per- 
formed as operations on symbols. The crucial observation 
is that Anan—1 . - . dođ1 = Ay + 10: AnAn— 1... A2. 
This implies that apan—ı...aı = aı + 10a, + 10° 
10dndn—1...@3 = dı + 10a. + 107a3 +: + 10" tan 
where 10”—! means ten multiplied by itself one less than 
n times (n itself is a natural number). Sums and products 
are then formed by using the commutative, associative 
and distributive laws together with an addition table and 
a multiplication table for one digit numbers, e.g. 
31+94= 3:°104+ 1+ (9:104+ 4 = G49): 10+ 
1+4= 12-10+5 = 125and 31 X 94 = 3:°10+1)X 
(9-10 + 4) = 3:9-10? + 3°4°10 + 1:9-10 + 
1:4 = 27-10? + 12°10 + 9-10 + 4 = 2:10? + 
8-10? + 11-10+ 4 = 2914. In short, sums and prod- 
ucts of symbols are found according to the familiar 
schemes. In fact, one could consider + and X as being 
defined by these schemes rather than by reference to the 
counting process. If such a procedure is followed, then the 
arithmetic of symbols can be defined very explicitly, but 
the arithmetic of natural numbers which it ‘represents’ 
remains undefined. This distinction between the arith- 
metic of symbols and the arithmetic of numbers may 
seem less artificial when the idea of ‘number’ is general- 
ized beyond the natural numbers and hence can no longer 
be apprehended so concretely in terms of counting. 


Rational Numbers 


For example, fractions, negative numbers, and zero 
are normally considered to be numbers. Everyone learns 
how to record such numbers (e.g. —1, 3, —4%, 0) and 
how to add and multiply them, but there is normally 
much less of a feeling of security about the meaning of 
the result. (The rule (—1)(—1) = 1 is particularly 
puzzling to most people.) For this reason, in describing 


Chapter9 | Further Study of Limits 


*That is, numbers which are ratios, 
although the term now includes zero 
and negatives as well. 


8q + 40 = 1 
169 +79 =1 


16 -8q + 16-40 = 16 
16:8¢q+ 8:79= 8 


16+ 8:79 =8+ 16-40 


Checks. 
~. 8g + 40 = 1 and 
16g + 79 = 1 describe 
the same g. 


360 


the arithmetic of such numbers, called rational numbers*, 
it is desirable to begin at the beginning and to make as 
few assumptions as possible. 

The arithmetic of rational numbers can be founded on 
the arithmetic of natural numbers (which will now be 
assumed to be completely familiar) by observing that 
every rational number q satisfies an equation of the form 
aq + b = c, where a, b, c are natural numbers, and that 
the triad (a, b, c) of natural numbers uniquely determines 
q. Thus —1 is the only solution of 1 +q + 2 = 1, 2 isthe 
only solution of 3q + 1 = 3, —4% the only solution of 
8g + 50 = 11, 0 the only solution of 1 -q + 3 = 3, etc. 
In other words, rational numbers can be described by 
triads of natural numbers, the rational numbers —1, %, 
— 4%, 0, being described, for example, by the triads 
(1, 2, 1), (3, 1, 3), (8, 50, 11), (1, 3, 3) respectively. 

Of course this description is not unique, that is, the 
same rational number can be described by more than one 
triad. For example, —4% can also be described by the 
triads (8, 40, 1), (16, 80, 2), (16, 79, 1), (32, 158, 2), etc., 
as well as by (8, 50, 11). The relation ‘describe the same 
rational number’ between triads will be denoted =, read 
‘is congruent to’. Thus (8, 40, 1) = (16, 80, 2) = (8, 50, 11), 
etc. More generally (a, b,c)= (a,b + d,c + d) and 
(a, b, c) = (ad, bd, cd). Given (a, b, c), (a’, b’, c’) one can 
determine whether (a, b, c) = (a’, b’, c’) by rewriting 


aq + b= c, dd +b 
as 
(*) aa’g + ba’ + ab’ = ca’ + ab’ 


aa’q’ + ba’ + ab’ = ac’ + ba’. 


If q = q’ then the right-hand sides are equal and con- 
versely; hence 


(a, b, c) = (a’, b’, c’) 


1 
(1) if and only if ca’ + ab’ = ac’ + ba’. 


This gives a criterion, expressed solely in terms of the 
arithmetic of natural numbers, for determining whether 
two given triads of natural numbers describe the same 
rational number. 

Given two triads (a, b, c), (a’, b’, c’) describing rational 
numbers q, q’, the relation q < q’ and the operations 
q + q’, qq’ can be expressed in terms of the triads as 
follows: If q <q’ then the equations (*) give ca’ + 
ab’ < ac’ + ba’. Moreover, the steps are reversible, and 


9.1 | The Real Number System 361 


Q 
+ 
Q 


Q 
Q 


th WH 


Wea W 


a pe, go, 
AAD 
-à ~J = 
N © W 
S Smee” 


, 


3.-2,2.2+ 3-3) 


+2.2,2.3+1.2) 


hence 
a,b,c) < (a’, b', c 
2) (a, b, c) < (a', b’, c') 


if and only if ca’ + ab’ < ac’ + ba’. 


Adding the equations (*) gives an equation satisfied by 
q + q’; hence 


(3) (a,b,c) + (a’, b', c’) = (aa’, ba’ + ab’, ca’ + ac’). 


That is, the rational number described by the triad on the 
right is the sum of the rational numbers described by the 
triads on the left. In the same way (ag + b)(a’q’ + b’) = 
ce’ gives aa’gq’ + ba'q’ + b'aq + bb’ = cc’, aa’qq’ + 
b(a’q + b) + b'(aq + b) = cc’ + bb’, aa’qg’ + be’ + 
b'c = cc’ + bb’. Hence 


(4) (a,b, c)(a’, 6’, c’) = (aa’, bc’ + cb’, cc’ + bb’) 


gives the rule for forming products. 

The rules (1), (2), (3), (4) completely describe the 
arithmetic of rational numbers in terms of the arithmetic 
of natural numbers. In the absence of a definition of ‘the 
arithmetic of rational numbers’ this statement has no 
meaning. However, it can be reversed and the ‘arithmetic 
of rational numbers’ can be defined to be ‘that which the 
rules (1), (2), (3), (4) describe’. This arithmetic is as 
follows: 

The arithmetic applies to triads of natural numbers. 
Two triads are considered to be ‘the same’ in the arith- 
metic if they are congruent in the sense defined by the 
rule (1). This use of the word ‘same’ is justified only if its 
basic meaning is valid, that is, only if it is true that 
‘(a, b, c) is the same as (a, b, cY, that ‘if (a, b, c) is the 
same as (a’, b’, c’) then (a’, b’, c’) is the same as (a, b, cY, 
and that ‘if (a, b, c)is the same as (a’, b’, c’) and (a’, b’, c") 
is the same as (a”, b”, c”) then (a, b, c) is the same as 
(a”, b”, c”Y. In other words the relation = defined by (1) 
must be shown to have the properties 


(a, b, c) = (a, b, c) (reflexive) 
(a, b, c) = (a’, b’, c’) 
implies (a’, b’, c’) = (a, b, c) (Symmetric) 
(a, b, c) = (@’, b', c’) 
and (a’, b’, c’) = (a”, b”, c”) 
imply (a, b, c) = (a”, b”, c”) (transitive) 


(5) 


The first two of these statements are the tautologies 


Chapter9 | Further Study of Limits 


(2, 7, 5) = (6, 21, 15) 
= (6, 12,6 
= (1, 2,1) 


362 


‘ca + ab = ac + ba’ and ‘ca’ + ab’ = ac’ + ba’ im- 
plies c'a + a'b = a'c + b'a’. The third says ‘ca’ + 
ab’ = ac’ + ba’ and c'a” + a'b” = a'c” + b'a” imply 
ca” + ab” = ac” + ba’, which is proved by multiply- 
ing the first equation by a”, the second by a, adding, and 
performing cancellations. 

Given two triads (a,b,c), (a’,6’,c’) one writes 
(a, b,c) < (a’,b’,c’) if ca’ + ab’ < ac’ + ba’. This 
relation < is consistent with the convention that con- 
gruent triads are to be considered ‘the same’ because 


(a, b, c) = (d, e, f), (@', b', c) = (d',e, f’) 
(6) and (a,b,c) < (a',b', c’) 


imply (d, e, f) < (d’, e', f’). 


Although this can be proved directly without difficulty, 
the following observation simplifies the proof somewhat: 
If (a, b, c) = (a’, b’, c’) then it is possible to go from 
(a, b, c) to (a’, b’, c’) by a finite number of applications 
(in fact four applications) of the rules (a, b,c) = 
(ga, gb, gc), (a,b,c)=(a,b+9,c+¢). [Explicitly, 
(a, b, c) = (aa’, ba’, ca’) = (aa’, ba’ + ab’, ca’ + ab’) = 
(aa’, ba’ + ab’, ac’ + ba’) = (aa’, ab’, ac’) = (a’, b’, c’).] 
Therefore in checking (5) it suffices to consider the cases 
(d,e,f) = (ga, gb, gc), (d,e,f) = (a,b + g,c + g)and 
similarly (d’, e’, f) = (g’a’, g'b', g’c’), (d’, €e, f) = 
(a’, b' + g’, c! + g’). In these cases (6) is immediate, e.g. 
(ga, gb, gc) < (a’, b’,c’) means gca’ + gab’ < gac’ + 
gba’ which, cancelling g, implies (a, b,c) < (a’, 0’, c’) 
and conversely. 

Triads are added in the arithmetic by the rule (3). This 
definition is justified only if the sum is ‘the same’ when- 
ever the summands are ‘the same’, that is, only if it is true 
that 


(a,b,c)=(d,e,f) and (a’,b’,c)=(d,e',f') 
imply (a, b, c) + (a', b,c) = def) + (d',e, f’) 
where = and + are defined by (1) and (3). Again it 
suffices to consider the cases (d,e, f) = (ag, bg, cg), 
(d,e, f) = (a,b + g,c+ g) for which (7) is im- 
mediately verified. 

Triads are multiplied by the rule (4). Again this 


definition must be justified by showing that the product 
is ‘the same’ whenever the factors are ‘the same’, that is, 


(a,b, c)= (d,e, f) and (a', b’, c) =(d', e, f’) 
imply (a, b, c): (a’, b’, c’) = (d, e, f): (d’, e, f’). 


(7) 


(8) 


9.1 | The Real Number System 


q= (3, 1, 2) 
q' = (2, 2, 3) 
q” = (1, 2,1) 
q (g' +q”) = (3,1, 2) (2,2+4,34+2) 
= (3, 1,2) (2,6, 5) 
= (6,5+12,10+4 6) 
= (6,17, 16 


qq + qq" = (3, 1, 2) (2, 2, 3) 
+ (3,1, 2) (1, 2,1) 
= (6,3 +4,6 + 2) 
+ (3,1+4,2 +2) 
= (6, 7,8) + (3, 5, 4) 
= (18, 21 + 30, 24 + 24) 
(18, 51, 48) 
(6, 17, 16) 
qalq +q") 


| 


MoMo MS 


(a, b, c)[(@’, 8”, c’) + (a”, b”, c”)] 


363 


This is again easily verified in the cases (d,e, f) = 
(ag, bg, cg) and (d,e, f) = (a,b + g,c + g) which suf- 
fice to prove (8) in general. 

The arithmetic defined by (1), (2), (3), (4) and justified 
by (5), (6), (7), (8) is now seen to have the following 
properties: 

Given two triads (a, b, c), (a’, b’, c’), exactly one of the 
relations (a,b,c) < (a’,b’,c’), (a,b,c) = (a', b’, c’), 
(a,b,c) > (a’, b’, c’) holds. This is called the trichotomy 
law. It is true because exactly one of the relations <, =, 
> holds for the natural numbers ca’ + ab’ andac’ + ba’. 

If three triads q = (a,b,c), qg =(a,b’,c’), q” = 
(a’’, b”, c’’) are given, then the commutative, associative, 
and distributive laws q + q' =q + q, qq’ =q9,9 + 
(q +q”) = (Q + g) + g”, a'g”) = (q'a, ag + 
q”) = qq' + qq” all hold. These are verified by direct 
computation. For example, the distributive law is proved 
by 


= (a, b, c)(a’a”, b'a” + a'b”, c'a” + a'd") 
= (aa'a”, be'a” + ba'c” + cb'a” + ca'b", ce'a” + ca'c” + bb'a” + ba'b") 
(a, b, c)(a’, b’, c’) + (a, b, c)(a’’, b”, c”) 
= (aa’, bc’ + cb’, cc’ + bb’) + (aa", be” + cb”, cc” + bb”) 
= (aa’aa", bc'aa” + cb'aa” + aa'bc” + aa'cb", cc'aa” + bb'aa” + aa'cc + aa’bb’’) 
= (a, b, Qia, b’, c') + (a”, b”, ey). 


)(1,1,6+4 1) 
+a+1,(a+1)(b+1)+1) 
+ 1) 


*Note that there are two different 
kinds of addition here and two 
different kinds of multiplication, 
namely, addition and multiplication 
of natural numbers—a + b, ab—and 
addition and multiplication of triads 
3 + b, ab. 


Given a natural number a, let @ denote the triad 
(1, 1,a + 1). Such a triad (or, of course, any triad which 
is ‘the same’ as such a triad, i.e. congruent to such a 
triad) is called a positive integer. Now a < b if and only 
if a< b, and a = b if and only if @=b. Moreover, 
a+b=a-+ b and ab = ab.* Hence the arithmetic of 
positive integers has all the properties of the arithmetic 
of natural numbers. 

If it is given that q = (a, b, c), then it is to be expected 
that āq + b =@. This is immediately verified: 


ag = (1, 1,a+ 1)(a, b, c) 
= (a,c + ab + b,ac+c+b)) 
= (a, ab, ac) = (1, b,c) 
aq+b=(1,6,c)4+ (,1,b + 1) 
=(1b4+1,c+b4+ HN=(,1c+4+ 1) =<. 


Thus every q satisfies an equation of the form aq + b =€ 


Chapter9 | Further Study of Limits 364 


where @, b, č are positive integers. Moreover, if q’ = 
(a’',b',c’) is any solution of ag’ + b=@ then this 
equation together with a’g’ + b' = ~@ gives ca’ + ab' = 
ac’ + ba’ exactly as in the derivation of (1). Hence 
ca’ + ab’ = ac’ + ba’, that is, q =q’. In other words, 
within the arithmetic the solution q of ag + b = @ is 
unique, meaning that any two solutions are congruent. 
Consider now the problem of solving an equation of 
the form gx + q’ =q” for x, given q, q’, q”. It is to be 
expected that the arithmetic will contain a unique 
element 0O such that solution is not (in general) possible 
when q = 0 but such that there is a unique solution x of 
qx + q’ =q” whenever q # 0. This can be verified as 
follows: 
*PBecause1-O+17 = 7. One would expect* 0 to be represented by the triad 
(1,1, 1). It is easily verified that (a, b,c) = (1, 1, 1) if 
and only if b = c; moreover, 0-'¢g=0 and 0+q=q 


(a, b,c) (1, 1,1) for all q where 0 = (1, 1, 1). Thus O- x + q’ = q” has no 
= (a Nye >») solution x unless q’ =q”, in which case any x is a 
= (1,1,1) solution. 


Given three triads q, g’, q” one can first replace them 
with congruent triads of the form q = (a, b, c), q = 
(a,6,c)+(1,1,1) (a,b,c), q” =(a,b,c”’). (Uf g= (4A, B,C) qd = 
= (a, p aTe (A’, B’, C”, q” = (A", B”, Cc") set a = AA’A”, b= 
AA' B” + AB'A” + BA'A”.) Moreover, if q # 0 then 
b = c, and hence either b + d = c orb = d + c andq 
is either of the form q = (a, 1,d + l)org=(a,d-+ 1,1). 
The solution of gx + g’ = q” will be considered for these 
two cases separately. 

rae l,d + 1),q' = (a, b, c’),q” = (a, b, c”) then 
=(1,1,d+ 1) = d and aq’ tbe’, aq’ + b=e" 

as was seen above. Hence gx + q’ = q” implies 


dgx + ag’ + b=aq" +b 
dx + @ =@". 


Since this equation has the unique solution x = (d, c’, c’’), 
the solution x of gx + q' = q”, if it exists, can only be 
this triad (or, of course, a triad congruent to this one). 
Conversely, for this x one easily verifies gx + q’ = 
(ad, dcd + d + c, CO + c + c"d) + (a, b, = 
(a, c', c”) + (a, b, c’) = (a, b, c”) = q". 

If g=(a,d+ 1,1) then āq+ d=0, and hence 
qx + q’ = q” implies 

agx + āq' + b+ dx =aq" + b+ dx 
C =0 xH t=T 4 dx. 


9.1 | The Real Number System 


(a, b,c) + (a, B, c’) 
= (aa,ab + ba, ca + ac’) 
= (aa b+b,c+c') 


365 


Hence gx + q’ =q implies x = (d, c”, c’). It is easily 
shown as above that this x is indeed a solution, which 
completes the proof of the elimination theorem: Jf q, q’, 
q” are given withq # Q, then there is a unique solution x 
of the equation qx + q' = q”. 

The elimination theorem can be used to prove all the 
familiar facts about subtraction and division: The 
unique solution x of q + x = Q is denoted by x = —gq. 
The identities —(—q) = q, — (qı + 42) = (—q:) + 
(—q2), —(4192) = (—41)42 = 91-42), (—41)(— G2) = 
—[q3(—q2)] = —[—41¢2] = qıq2 are deduced immedi- 
ately from this definition. The symbol qı — qz denotes 
qı + (—@2). Since 1-(1-q) = (1° Dq = I - q it follows 
from the uniqueness statement of the elimination theorem 
that 1-q =q for all q. The unique solution x of the 
equation gx = 1 (q #0) is denoted by x= l/q or 
x=q '. The identities (q7')"'=q, (q19¢2)7' = 
(4D 4t, (~q)! = —q7' are deduced immediately 
from this definition. The symbol g/g, denotes go(q1)°. 
Now the unique solution x of gx + q’ = q” (q # 0) can 
be written as (q” — q')/q meaning, of course, (q” + 
(—q’))q~*. In particular, g = (a, b, c) can be written as 
(c — b)/a. 

A triad q = (a,b,c) satisfies q > 0 if and only if 
a+ b< a+c, that is, if and only if there is a natural 
number d such that c=b+4+d, agt+b=b+4d, 
aq = d. It follows that q is positive (q > 0) if and only if 
q satisfies an equation of the form āq = d where ā, d are 
positive integers. This observation implies that if gq, > 0, 
q2 > Otheng,; + q2 > 0, qıq2 > Q. It is to be expected 
that qı < q2 if and only if q2 — qı > 0; to verify this, 
set q3 = q2 — q, and observe that the statement to be 
proved is that gq; + q3 > qı if and only if q3 > 0. By 
setting qı = (a,b,c), q3 = (a, b,c’), this becomes 
‘(a,b + b,c + c) > (a, b,c) if c’ > b and conversely’ 
which is immediately verified. The familiar facts about 
inequalities can now be deduced: q} + q2 < qı + q3 if 
and only if q2 < q3; if qi < q2 and qə < q3 then 
qı < q3; if qı > 0 and q2 < q3 then qıq2 < qiga; if 
qı > Othen —qg, < 0 and conversely. 

The absolute value \q| of q is defined to be q if q > 0, 
Oifg = 0, —qifq < 0. Then in all cases |g] > 0, |g] > q, 
and |—q| = |q|. The identity |q1q2| = |qil |q2| is proved 
by choosing signs + so that +q, and +q2 are both 
positive, hence |qigo| = |(+¢1)(+42)| = |+qi| |+q2| = 
lil lq2|. To prove the triangle inequality |q + 42| < 


Chapter9 | Further Study of Limits 


366 


lqil + lqg| one can assume, by changing both signs if 
necessary, that qı + q2 È 0; then |g; + go] = qı + 
q2 < lail + laol. 

In short, the arithmetic of rational numbers defined by 
the rules (1), (2), (3), (4) has all the expected properties. 
It is convenient now to drop the use of the bars and to 
write 1, 2, 3,... meaning elements of the extended 
arithmetic. Also, the symbol = will be replaced by the 
more familiar symbol =. 

An ‘object’ q in this extended arithmetic is called a 
‘rational number’. Thus ‘rational number’ is, like 
‘natural number’, essentially an undefined concept. Rules 
have been given for representing rational numbers, for 
ordering them, and for forming sums, differences, 
products, and quotients, but no statements about what 
is being represented have been made. It is not the 
‘numbers’ but the operations and relations among them 
which are paramount. 

An integer is a rational number q which satisfies an 
equation of the form q + b = c where b, c are positive 
integers; in other words the integers are the positive 
integers, zero, and minus positive integers, i.e. negative 
integers. 

It can be shown that every rational number q can be 
written uniquely as 


q = +((a positive integer or zero) + (a proper fraction in lowest terms or zero)) 


where a ‘proper fraction in lowest terms’ is defined to be 
an expression of the form a/b where a < b and a, b are 
positive integers without common factors. The usual 
arithmetic operations applied to such expressions are 
then justified as operations in the arithmetic of rational 
numbers as defined by the rules (1), (2), (3), (4). 

An important property of the arithmetic of rational 
numbers is the Archimedean law: \f positive rational 
numbers q1, q2 are given, then there is a natural number 
(positive integer) n such that nq; > qə. A useful variant 
of this law is: If qi, q2 are given with |q,| < 1 and 
qz > 0, then there is a natural number n such that 
lgi| < q2 (where gj means gi‘qi...q1 (n times)). 
These facts are easily proved (Exercise 11). 

In actual practice the arithmetic of rational numbers is 
quite cumbersome and one often uses decimal fractions 
instead. A decimal fraction is a rational number which 
can be written with a denominator which is a power of 


9.1 | The Real Number System 


1 1 
ati + tat Te 


367 


1 
< ait t (oni) + (Ga) + --- 


1 
a 


ale 
integer 


integer 


d+ad+tit tS 


integer + “4 -+ 


ten, that is, a rational number q which satisfies an equa- 
tion of the form 


10”q = integer 


where n is a natural number. Using the familiar decimal 
point notation, the arithmetic of decimal fractions is 
reduced to the arithmetic of natural numbers with the 
small added complications of keeping track of the sign 
and of the position of the decimal point. Every rational 
number can be approximated arbitrarily closely by 
‘rounding’ to a decimal fraction. (More specifically, if a 
rational number q and a natural number n are given, then 
there is a decimal fraction g’ with n places (at most) to 
the right of the decimal point such that |q — q'| < 4107”. 
This approximation process is called ‘rounding to n 
places’.) The ease of computation with decimal fractions 
usually compensates for the loss of accuracy resulting 
from rounding. 


Real Numbers 


The arithmetic of rational numbers cannot be used as 
a basis for calculus because a limit of rational numbers 
need not be a rational number. This is illustrated by the 
following examples: 

Newton’s method (§7.3) gives the sequence 1, 3, 4, 


2 . 
ŠT dnt. = 4 (an + =) , Of rational numbers 


n 
converging to v2. However v2 is not a rational number 
because the square of a fraction in lowest terms is a 
fraction in lowest terms. Hence p?/q? = 2 could occur 
only if g? = 1, p? = 2, which is impossible. 


1 l l . 
The number e = 1 + 1 titat gy + isthe 
limit of the sequence of rational numbers 1, 2, 24, 23, 254, 
l , 
Jee Inti = qn + mE However, if e were rational 


then ale would be an integer for all sufficiently large 
integers a; but this would give 


a! 
DECES 
l ] 


@tiatd)t@tiatbparsyt 


integer + (positive number smaller than ‘) 


which is impossible. 


Chapter9 | Further Study of Limits 368 


The number r, which is defined to be the ratio of the 
area of a circle to the square of its radius, is the limit of a 
sequence of rational numbers 


_ number of pairs (+a, +b) of integers such that a” + b? < n? 
as was seen in Exercises 2, 3, §2.3. However, as is well 
known, ~v is not a rational number. (There is no simple 
proof of this fact.) 

In order for the operations of calculus to be meaningful 
it is necessary, therefore, to extend the concept of 
‘number’ in such a way that these limits v2, e, t and 
others like them are ‘numbers’. The extended concept is 
of course that of a real number. As in the case of natural 
numbers and rational numbers, what is needed is a 
system for representing real numbers and of performing 
operations on them. 

Intuitively speaking, a real number is a number which 
can be approximated arbitrarily closely by rational 
numbers (usually decimal fractions are used). Therefore 
a real number can be ‘represented’ by giving a sequence 
of rational numbers converging to it. The Cauchy 
Convergence Criterion gives a definition of ‘convergence’ 
for sequences of rational numbers without reference to 
real numbers. Hence, the Cauchy Criterion tells which 
sequences of rational numbers ‘represent real numbers’: 
A sequence q1, q2, . . . Of rational numbers is said to be 
convergent if it is true that for every rational number 
e > Othere is a natural number N such that for any two 
natural numbers, n, m > N the condition |g, — qm| < € 
holds. Such a sequence will be said to ‘represent a real 
number’ and the ‘arithmetic of real numbers’ will be 
described as an arithmetic which applies to such se- 
quences. 

Since the remainder of the discussion is devoted to this 
arithmetic of convergent sequences of rational numbers 
it is important to have as clear a formulation of the idea 
of convergence as possible. An alternative statement of 
the Cauchy Criterion is the following: An interval 
{q < x < q} of rational numbers will be said to contain 
a given infinite sequence {q1,q2,q3,...} of rational 
numbers if it contains all but a finite number of terms qn 
of the sequence, that is, if there is an N such that 
< qn < G for all n > N. Using this terminology, a 
given sequence {qn} is convergent if and only if it is 
‘contained in arbitrarily small intervals’. That is, a 


9.1 | The Real Number System 


Sat tah + GY 
(n+1)S=(n+1)4+1+, 
n+14+8S 


+. 
A. 
+1 


+... 


369 


sequence is convergent if and only if for every e > 0 
there exist g, q such that 0 < g — ĝ < e and such that 
the interval {g < x < q} contains the sequence. This 
condition clearly implies the Cauchy Criterion (g < 
qn Sqandg < qm < 4 imply |qn — qm| < q — 7) and, 
conversely, if the Cauchy Criterion is fulfilled, then an N 
can be found such that |g, — qm| < €/2 for n,m > N. 
Hence qy — €/2 < qm < qn + €/2 for m > N, that is, 
the sequence is contained in the interval {qy — (€/2) < 
x < qu + (e€/2)} of length e. 

The above sequence representing v2 was proved to be 
convergent in §7.5. The above sequence for e can be 
shown to be convergent as follows: The sequence in- 
creases, and after the nth term the further increase is less 
than 


l l l 
[+r tarrat] 
n! n+l n+ | n+ 1 


ntl 


n-en! 


, n+l , 
Hence the interval lar S<xlqr + + | contains the 
n:n! 


sequence and has arbitrarily small length. It is much 
more difficult to show that the above sequence for r is 
convergent; this is the substance of Exercise 2, §2.3. Note 
that for this example, unlike the preceding ones, the 
convergence is slow and each gq, difficult to compute; 
computationally this sequence is useless. 

Two convergent sequences can, of course, represent 
the same real number. For example, the decimal expan- 
sion 1/2 = 1.41421... means that 1/2 can be repre- 
sented by the convergent sequence 1, 1.4, 1.41, 1.414, 


1.4142, ... as well as by the sequence given above. The 


intuitive notion of real number suggests that if this is the 
case then an arbitrarily small interval about this ‘real 
number’ would contain both sequences. This can be 
stated in terms of sequences of rational numbers: 


(1°) {qn} = {qh} means that there exist arbitrarily short 
intervals which contain both {qn} and {q5}. 


Written out in full the condition (1’) is: Given any 
rational number e > 0 there exist rational numbers ĝ, g 
and natural numbers N, M such thatO<q—G<e 


Chapter9 | Further Study of Limits 


*The term ‘equivalent’ is more 
commonly used; ‘congruent’ is used 
here because it suggests a notion 
from the domain of arithmetic and 
because the word ‘equivalent’ has 
other uses. 


(5’) 


(2’) 


370 


and such that 9 < qn < q whenever n > N,ĝĵ < q, <q 
whenever n > M. 

Two convergent sequences of rational numbers will be 
considered to be ‘the same’ or congruent* in the arith- 
metic of real numbers if the condition (1’) is satisfied. As 
before, this terminology is justified only if it is shown 
that 


{Qn} = {Gn} (reflexive) 
{Gn} = {Gn} implies {qn} = {qn} (symmetric) 
{dn} = {Qn} and {qn} = {qn} imply {qn} = {gn} 

(transitive). 


The first two statements are tautologies. To prove the 
third it suffices to note that an interval of length €/2 can 
be found containing both {qn} and {g,} and another of 
length €/2 can be found containing both {q,} and {4%}. 
These intervals must overlap because they have points of 
{q,,} in common, and hence together they give an interval 
of length <e which contains both {qna} and {q7,}. 


nD 


1 n 
The formula e = lim ¢ + 3 means that the 
n 


n 
sequence g, = (i + *) is congruent to the sequence 
n 


for e given above; this can be proved directly (see 
Exercise 8). The problem of computing r can be described 
as the problem of finding other sequences congruent to 
the above sequence which are computationally more 
practical (see §7.5). 

The relation {qn} < {q,} is very naturally defined by: 


{qn} < {g;,} means that there exist rational numbers 
G<q <q <q’, such that {qn} is contained in the 
interval {g < x < 7} and {q/} in the interval 
q<x<q}. 


Written out in full the condition (2’) states that, in addi- 
tion to 9 <q <q’ <q”, there exist natural numbers 
N, M such that J < qn <q’ for n> M and @ < 
gi, < Į! forn > M. The fact that 


(6) {Qn} = {Pn}, {Gn} = {Pats {Qn} < {an} 
imply {Pn} < {Pr} 


is proved as follows: Let 9 < q < g’ < Ẹ' be given for 
{an}, {qn} aS above. Choose e less than (g’ — q)/2 and 
find an interval of length less than e which contains 


9.1 | The Real Number System 


371 


{dn}, {Pn} and another of length less than e which 
contains {qa}, {p,}. These intervals cannot overlap 
(because then together they would form an interval of 
length less than g’ — q which would have to contain 
rational numbers <Q’ and rational numbers >g, which 
is impossible) and serve to prove that {pa} < {ph}. 

If {qn} and {qn} are convergent sequences of rational 
numbers then {qn + q,} is a convergent sequence of 
rational numbers. This follows immediately from the fact 
that <q, <qandq < q} <q’ imply thatg+q’ < 
Qn t+ Gn LG +G'; since (q +g) — 0+3) = Q-a) + 
(q’ — q’), this shows that if {qa}, {qn} can be contained 
in intervals of length e then {qn + q;,} can be contained 
in an interval of length 2e. Thus 


(3’) {dn} + {Gn} = {qn + Qa) 


defines an operation on convergent sequences of rational 
numbers. The above argument also shows that 


{Gn} = {Pn} and {qn} = {Dn} 


(7’) 
imply {qn + Gn} = {Pn + Pn} 


and hence that the definition (3’) is consistent with the 
definition (1’) of ‘sameness’ in the arithmetic of real 
numbers. 

Similarly, if {qn}, {qn} are convergent sequences of 
rational numbers so is {gnq,,}, and hence 


(4’) {dn} © {Ont = {qnday 


defines an operation on convergent sequences of rational 
numbers. The fact that 


{dn} = {Pn} and {qn} = {Pn} 


(8’) . , ; 
imply {9n9n} = {PnPn} 


shows that this can be considered as a multiplication 
operation in the arithmetic of real numbers. To prove 
these statements let g, g’, € be such that the interval 
{|x — g| < e) contains {qn}, {pn} and the interval 
{|x — q’| < e} contains {q4}, {p,} (always in the sense 
of ‘contain’ defined above, that is, all but a finite number 
of points of the sequences lie in the stated intervals); then 
(anqa — G0'| = |QnGn — qag’ + ani’ — 49'| < lanl lan — "| 
+ (g| lan — g| < dg] + e + |g'le for all sufficiently 
large n, hence {q,g,} and {p,p,\ are contained in the 
same interval of length 2e(|g| + |g’| + ©). Since |g|, |q’| 


Chapter9 | Further Study of Limits 


*Meaning, of course, that if x, x’ are 
both solutions then x = x’. 


372 


are bounded and e can be made arbitrarily small this 
completes the proof. 

The rules (1’), (2’), (3’), (4’), justified by (5’), (6’), (7’), 
(8’), define the arithmetic of real numbers. This arith- 
metic is also called the ‘real number field’ or the ‘real 
number system’. It has the following properties: The 
trichotomy law “given two elements {qn}, {qn} of the 
arithmetic, exactly one of the conditions {qa} < {q,}, 
{dn} = {gn}, {qn} > {gn} holds” is proved as follows: 
If {gn} Æ {q,} then there must exist a rational number 
€ > 0 such that no interval of length <e contains both 
{qn} and {q,}. Since {qn}, {g,} are convergent, there are 
intervals of length €/2 containing each of them. These 
intervals cannot overlap, since otherwise the assumption 
on e would be contradicted. They therefore serve to 
prove either {qn} < {Gn} Or {dn} > {qn 

If ry, ro, rg are convergent sequences of rational 
numbers, then the commutative, associative, and dis- 
tributive laws ri + ro = ro + Fi, Piro = Fj, hy + 
(ro +173) = (r1 + r2) + F3, ri (rors) = (ire)ra, rive + 
r3) = Fifa + rirg follow trivially from the analogous 
laws for the arithmetic of rational numbers. 

Given a rational number q, let G denote the constant 
(and therefore convergent) sequence {g,q,q,...}. Then 
qı < q2, 91 = q2, qı > q2 are equivalent to Jı < Go, 
qı = J2, Gi > Jo respectively; moreover Jı + J = 
qı + qo and GiG2 = qıq2, and hence all rules of the 
arithmetic of rational numbers apply to such sequences. 

The sequence 0 satisfies r + 0 = r for all convergent 
sequences of rational numbers r. If r = {qn} is a conver- 
gent sequence of rational numbers then —r can be defined 
to be {—q,} because this is a convergent sequence of 
rational numbers which satisfies r + (—r) = 0. The 
sequence | satisfies 1-r=~r for all r. If r= {qn} is 
convergent and r # 0 then r—! can be defined to be 
{g, '} because this sequence is convergent and satisfies 
r-r 1 = ]. Hence the equation rx + r' =r” forr 4 0 
has the unique* solution x = (r” + (—r^)r +. The only 
statement here which requires any proof is that {g, '} is 
convergent. In fact, individual terms q, may be zero, 
hence q, * may not be defined for some n. However, by 
the trichotomy law there is a ô such that |g,| > ô for all 


sufficiently large n. Then |g, ' — q,,'| = Ze < 
min 


5 "ln — qml, hence {q; +} satisfies the Cauchy Criterion 
if {gn} does. 


9.1 | The Real Number System 373 


*Here e is a real number. Since e > O 
implies e > € for some rational 
number e this actually involves no 
greater generality than assuming that 
e itself is rational. 


Given ry, ro, the relation r; < rg holds if and only if 
ro — rı > 0. This is proved simply by setting rı = {qn}, 
ro = {qn}, choosing disjoint intervals containing these 
sequences, letting 6 be the distance which separates these 
intervals and noting that then q, — qn > 6 for all 
sufficiently large n. 

This reduces the ordering relation r; < rə to the 
notion of positivity r > 0. Since r > 0 if and only if 
r > 6 for some rational 6 > 0 it follows that ‘71,72 > 0 
implies rı + rg > 0 and rir > 0°. Hence ‘ry < ro, 
ro < r3 imply ry <r% and ‘ry < ro, O < rg imply 
rira < rors’. Finally r > 0 if and only if —r < 0. 

The absolute value |r| of r is defined to be r ifr > 0, 0 
if r = 0, —r if r < 0. The proofs of r < |r|, |ryro| = 
Irail Irol, [Fi + rel < [ri] + |re| are as before. For every 
r there is a rational K such that |r| < K (take K to be the 
upper limit of an interval containing r). The Archimedean 
laws ‘given rj, rg with rı > O there is a natural number 
n such that nr, > |r2| and ‘given r1, ro with rı > Oand 
\ro| < 1 there is a natural number n such that |r$| < ry’, 
are also proved as before. 


If r = {qn} then the sequence J, in the arithmetic of 
real numbers converges to the limit r; that is, given any 
e > Othere is an N such that |r — ĝa) < €foralln > N. 
In other words, every convergent sequence of rational 
numbers has a limit in the arithmetic of real numbers. 
This is, of course, a triviality since the limit is just the 
sequence itself, i.e. the ‘real number’ it represents. 
Slightly less trivial is the fact that every convergent 
sequence of real numbers has a limit in the arithmetic of 
real numbers. This basic property of the real number 
system is called completeness. 


Theorem 


Completeness of the real number system. If r1, ro, r3,..-- 
is an infinite sequence in the arithmetic of real numbers 
with the property that for every* e > 0 there is an N such 
that |r, — rm| < € whenever n, m > N then there is an 
element re in the arithmetic of real numbers with the 
property that for every e > 0 there is an N such that 
Fn — Fol < € whenever n > N. In short, every conver- 
gent sequence of real numbers (sequence satisfying the 
Cauchy Criterion) has a limit which is a real number. 
The limit is, of course, unique. 


Chapter9 | Further Study of Limits 


374 


Proof 


The main idea of the proof is to ‘weed out’ sequences 
satisfying the Cauchy Criterion so that they have the 
property that the mth term of the sequence represents the 
limit to n decimal places. More specifically, the given 
sequence 7;, ro, ’3,... can be assumed at the outset to 
have the property 


(9) [Fn ~ Tm| < 107" (m > n) 


(r, represents all succeeding terms r,, with an accuracy of 
n decimal places) because if it does not then it can be 
replaced by one which does. This is done as follows: By 
the Cauchy Criterion there is an N such that |ry — rm| < 
(1/10) for all m > N. Throw out all terms F1, ro,..., 
ry— 1 Of the given sequence and renumber the remaining 
terms 7; = f;4(Nw—1) SO that the term which was ry is 
now rı. Next select a term ry such that |ry — ra| < 
(1/100) for n > M, throw out all of the terms rə, 
r3,...,/’m—1z, and renumber so that ry becomes rə. 
Continuing this ‘weeding’ process, the given sequence 
can be reduced to one which satisfies (9) for all n. 

Thus it can be assumed not only that (9) holds for the 
given sequence F1, Fo, 73,..., but also that each r; of the 
sequence is represented by a sequence r; = {q®} of 


rational numbers q® satisfying 


lg? — | < 107” (m > n). 
It will be shown that then the ‘diagonal sequence’ q$", 
gS, g?, ... represents a real number with the desired 
property. 


Note that if r = {g,} is a convergent sequence and if 
€ > Qisa rational number then |r| < €if and only if the 
sequence {qn} is contained in a subinterval of {|x| < e}; 
this is simply a restatement of the conditions r — € < 0, 
r+é€>0. Thus the assumption |r; — r;| < 107 
(j > i) implies |q® — g”| < 107? for all sufficiently 
large n. Together with the assumption on the sequences 
{q£} this gives 
gi? — a 

< gi? — a | + a? — a) + Ign? — 95” 
< 107*+ 107 + 107’ < 3- 107° 


for j > i (and for all sufficiently large n but, since the 
conclusion is independent of n, this need not be stated). 


9.1 | The Real Number System 375 


Exercises 


This proves that the sequence {q‘”} is convergent. Let ra 
denote the real number it represents. Then |r; — Fol is 
estimated by estimating 


dn? — On| < lan? — a1 + las? — a 
< 107° -+ 3. 107? = 4.107’ 


for all n > i. Thus |r; — r.| can be made arbitrarily 
small and r,, has the desired property. Finally, the unique- 
ness of F» follows from the triangle inequality |r. — rol < 
[Fo — In| + [Fn — ra| which shows that if |r. — ra| and 
ro — řa| can both be made arbitrarily small then 
Io — Ye| is less than every positive number, which 
cannot be true if |r. — ra| Æ 0, i.e. cannot be true if 
ro É Fa. 


This completes the discussion of the properties of the 
arithmetic of real numbers. It is convenient now to drop 
the use of the bars and to let a rational number q be 
considered as an element of the extended arithmetic. 
Moreover, the symbol = will again be replaced by the 
ordinary = sign. 

An ‘object’ in the arithmetic of real numbers is called 
a real number. Rules for representing real numbers have 
been defined and relations and operations have been 
defined for the representations—which is what is im- 
portant—but no attempt has been made here to define 
what the representations represent. The concept of ‘real 
number’, like those of ‘natural number’ and ‘rational 
number’ remains undefined. 


1 Let qi, g2, q3 be the rational numbers which satisfy the 
equations 


891 +2=5 
Sq2 +7 =1 
7q3 + 4 = 9. 


Find qı + q2 + q3 and q1ıq2q3, first using the ordinary 
notation of the arithmetic of fractions and then using the 
triad notation of the text. 


2 Given a triad (a, b, c) of natural numbers representing q, 
find a triad representing g~!. [Begin by writing q as 
(a, b, b + d) or minus such a number. ] 


Chapter9 | Further Study of Limits 


376 


3 Two natural numbers a, b are said to be ‘congruent 
modulo 6’, written ‘a = b (mod 6), if there exist natural 
numbers p, q such that a + 6p = b + 6g. Show that this 
relation of congruence (‘sameness’) is reflexive, symmetric 
and transitive. Show that it is consistent with addition and 
multiplication of natural numbers, and that it therefore defines 
an arithmetic in which there are only six distinct ‘objects’. 
Show that subtraction is possible in this arithmetic (an 
equation of the form a + x = b has a unique solution x for 
all a, b) even though it is not possible in the arithmetic of 
natural numbers. What is zero in this arithmetic? Show that 
division by non-zero elements in this arithmetic is not possible 
(i.e. an equation ax = b with a ~ 0 need not have a unique 
solution x). For which elements a of this arithmetic does the 
equation x? = a have a solution x? (Give a list.) 


4 ‘Arithmetic modulo n’ (n a natural number) is the arith- 
metic defined by the congruence relation a= b (mod n), 
which means a + pn = b + qn for some natural numbers 
p, q. All statements in Exercise 3 for the case n = 6 are true 
for arbitrary n, except for the impossibility of division. For 
which natural numbers n is division by non-zero elements 
possible in an arithmetic modulo n? For a given n, which 
natural numbers a are ‘invertible mod n’, i.e. have inverses b 
such that ab=1(modn)? [Find these answers experi- 
mentally. ] 


5 Picturesquely speaking, two natural numbers are congruent 
modulo n if and only if it is possible to go from one to the 
other in steps of size n. Show that a is invertible mod n if and 
only if it is possible to go from any natural number to any 
other natural number by a combination of steps of size n and 
steps of size a. Given a, n let S be the set of all natural numbers 
which can be reached from 1 by a combination of steps of 
size n and steps of size a. Show that there must be a natural 
number d such that S is the set of all natural numbers of the 
form 1 + pd (p = 0,1, 2, 3,...) and that d is the greatest 
common factor of a and n. Apply this to prove a theorem 
stating the conclusion of Exercise 4. 


6 The proof of Exercise 5 is ‘non-constructive’ in that it 
makes appeal to ‘the set of all numbers which can be reached 
in steps of size a and m. It is made constructive by the follow- 
ing process known as the Euclidean Algorithm: Given a, n 
with a < n, then either n is a multiple of a or there are natural 
numbers p, b such that n = pa + b where b < a. Then either 
ais a multiple of b or a = qb + c where c < b. In this way 
one can construct a decreasing sequencen >a>b>c>°:: 
which, of course, eventually terminates. From n = pa + b 
it follows that any number which can be reached in steps of 
size a, b can also be reached in steps of size n, a; moreover 


9.1 | The Real Number System 


—-=Wh 
NON © 
tow vw wou 


N N = = 


anaw 


ANN 
+++ 


+ 


—_ 


Na m 


377 


on 


the answer is given by the constructive formula b = n — pa 
telling how to take steps of size b if steps of size n, a are 
allowed. Similarly any number which can be reached in steps 
of size b, c can be reached in steps of size a, b (and hence in 
steps of size n, a) and an explicit formula can be given for the 
answer. Thus to take a step of size 1 when steps of size 49 and 
32 are allowed, one first constructs the sequence 49 > 32 > 
17 > 15 > 2 > 1 and learns how to take steps of size 17, 
15, 2, 1 successively. Use this method to solve the equation 
49p = 1 + 32q (p,q natural numbers). In the same way solve 
48p = 1 + 3lg and 63p = 1 + 40g. 


7 Is the fraction 2222 in lowest terms? 


8 Prove that lim (1 + n) = e where e is the real number 
n 


n—0 1] 


represented by the sequence qn = 1 +1 +4+t ++ mE 


[Expand p, = (1 + 1/n)" by the binomial theorem and show 
that pn < qm for m > n. Then show that for every n, e there 
is an N such that pm > qn — e whenever m > N. Conclude 
that any interval which contains the sequence {qn} contains 
the sequence {pn} (in the sense of ‘contains’ defined in the 
text), hence that {pn} is a convergent sequence congruent to 
{gn}.] Find lim (1 + 5) to three decimal places. 


nD 


9 Prove the alternating series test: If ay — a2 + a3 — 


a4 +--+ is an alternating series in which a, > ani1 > 0, 
and if lima, = 0, then the sequence of partial sums 
ND 


n 
Sn = » (—1)'-!a, is convergent. 
k=1 


10 Prove that an increasing sequence gn41 > qn which is 
not convergent is not bounded (for every K there is an N such 
that qn > K) hence that a bounded increasing sequence is 
convergent. 


11 Archimedean law. Show that if a, b are natural numbers 
then there is a natural number n such that na > b. [This is 
easy.] Show that the same is true for positive rational numbers 
a, b. [Use a common denominator.] Show that if a, b are 
positive rational numbers then (1 + a)” > b for n sufficiently 
large. Conclude that if |gi] < 1 and q2 > 0 then |qi|” < q2 
for n sufficiently large. 


12 Continued fractions. The answer to Exercise 7 can be 
formulated as follows: £722 can be written as a fraction in 
which the numerator is 1 and the denominator is 2 + 4223, 
Then +422 can be written as a fraction in which the numerator 


is 1 and the denominator is 1 + 4555. Then 7544, can be 


Chapter9 | Further Study of Limits 


*For an introduction to this theory 
see C. D. Olds, Continued Fractions, 
Random House, 1963. 


1953 
S115 


378 


treated similarly and the process repeated until the denomina- 
tor is a whole number. This gives 2222 as a fraction whose 
denominator is a fraction whose denominator is a fraction 
whose denominator is a fraction..., the process eventually 
terminating. This is the representation of £722 as a continued 
fraction. The theory of continued fractions has many profound 
and beautiful applications—for example Lambert’s proof that 
m is not a rational number.* One of the principal difficulties 
in the theory of continued fractions, albeit a trivial one, is the 
notational awkwardness of writing fractions within fractions 
within fractions, ad infinitum. This is obviated by using 2 X 2 
matrices instead of fractions. For example, the computations 
of the Euclidean algorithm can be written 


1953\ _ (0 1) (1209 
5115) (1 2) (1953) ° 


(i583) = Q 1) (io) 
(iss) 7 C >) (<3) 


hence all together 


CG GG DG DE DG l) 


When the matrix product is computed 


1953 8 21 0 

5115 21 55/ \93 
this gives the answer to Exercise 7. Note that the determinant 
of the product is (—1)’ = 8-55 — 217, which implies that 
55 and 21 have no common factors, i.e. 2¢ is in lowest terms. 


Use this method to reduce the fraction $222 to lowest terms. 
Write it as a continued fraction. 


2379 


13 Continued fraction expansion of a ratio of real numbers. 
Given two positive real numbers rọ > rı > Oone can define a 
sequence ro > rı > r2 > r3 > ++: of positive real numbers 
by the Euclidean algorithm 


Fn—1 = QAnln + Fn+1 
a» = natural number 
0 < Fn4-1 < Fn. 


If ra4}ı = 0 for some n the sequence ro > r1 > re > °°" 
terminates, and otherwise continues indefinitely. Define 


9.1 | The Real Number System 379 


*This is the ‘golden section’ of the 
Greeks; see H. S. M. Coxeter, 
Introduction to Geometry, John 
Wiley and Sons, 1961, Chapter 11, 
or L. Zippin, Uses of Infinity, Random 
House, 1963, Chapter 5. 


integers Pn, Gn by the formula 


01\/0 1\... (0 1\_ (pra Pa). 
1 ay 1 ag 1 an Qn—1 dn 


This gives two different definitions of pn, gn but there is no 
conflict because the equation 


(2 Pot) _ (2 Pn) ({ 1 ) 
qn Qn+1 Qn—1 Qn] \1 any 
shows that the two definitions coincide. Show that pn/gn is a 
fraction in lowest terms for all n, that if r,41 = 0 then 
r1/ro = Pn/@n, and that if r,41 > 0 for all n then ri/ro is an 
irrational number and the fractions p,x/qn converge to ri/ro 
as n — œ. The convergence in fact follows the pattern 
r 
p 20 P2 P ! 


=— L'e [L — [L'5°lil 
qo q2 d4 ro 


and the error is estimated by 


EL a LT (—1) . 
dn Qn—1 QnGdn—1 


[It is useful to show that if r, p,q, R, P, Q are positive numbers 
with p/q = P/Q then (rp + RP)/(rg + RQ) lies (strictly) be- 
tween p/q and P/Q.] 


14 The integer ratios p,/qn are called the convergents of the 
ratio of real numbers 0 < r;/ro < 1. Show that the con- 
vergents of r1/ro are the ‘best’ rational approximations to 
r1/ro in the sense that if p/q is any rational number for which 
A A i Pry, 
ro q ro Qn 

where P,/dn is a convergent, then q > qn. In short, a better 
approximation than p,/gn must have a larger denominator. 


15 An important role in the theory of continued fractions 
is played by the ratio ri/ro for which a; = 1, a2 = 1, 


az = 1,... . Show that the convergents are 2, +, 3, $, 2, 2, 
z, ż3,... and that the ratio ri/ro is a solution of the 
equation x? + x = 1 hence 

r V5 — . 

Z = —-_— = 2sin 18° 

ro 2 


chord subtended by an angle of z * 


Find this number as a decimal fraction accurate to three 
decimal places. 


16 The ancient trigonometric tables of Ptolemy expressed 
fractions in sexagesimal form rather than decimal form. For 


Chapter 9 | Further Study of Limits 


*For a lucid explanation of Ptolemy's 
techniques, see A. Aaboe, Episodes 
from the Early History of Mathematics, 
Random House, 1964, pp. 101—125. 


T See O. Toeplitz, The Calculus, Univ. 
of Chicago Press, 1963, p. 16. This 
book contains a wealth of information 
on the history of calculus. 


380 


example, Ptolemy gives the value 31’25” for the length of the 
chord which subtends an angle of 4° in a circle of radius 60, 


1.€. 
_ f1\° 3l 25 
120 sin (1) = 60 + 602 


dsin 7 = l 2. 
720 602° 603 


Or 


Using an accurate value of 7, verify that this value is correct 
(as Ptolemy proved that it was) to the nearest 6073.* 


17 A real number a (for approximate) is said to represent 
another real number ¢ (for true) ‘with an accuracy of n 
decimal places’ if |t — al < 107”. (See Exercise 2, §2.2.) 
Show that: 


(a) Every real number can be represented with an 
accuracy of n decimal places by an n-place decimal 
fraction. 

(b) Two representations which satisfy the condition of 
(a) differ by at most +1 in the last place. 


18 Sexagesimal fractions were long used in place of decimal 
fractions. For example, in 1250 the mathematician Leonardo 
Pisano found the real root of x3 + 2x? + 10x = 20 to be 


x = 17227423312440» 
i.e. 
22 7 42 33 4 40 f 
= | ~~ a nD ~~ ne TRR 

x= 1+ 0T 602 7 6o t 60F 605 606 
Assuming this answer is correct to the nearest (60)—°, with 
how many decimal places of accuracy does this represent the 
root of the equation? 


19 Infinite decimal fractions. It is often convenient to think 
of real numbers as being represented by infinite decimal 
fractions, eg. $ = .3333..., = 6666..., 7 = 
3.141592 ..., etc. Such an expression is said to represent the 
real number r if the sequence of rational numbers (decimal 
fractions) obtained by truncating the infinite decimal (taking 
the first n places without rounding) represents the real number 
r in the sense defined in the text. 


(a) Show that every real number r can be represented 
by an infinite decimal fraction. [Assume r > 0 and 
consider the largest n-place decimal which is less 
than r.] 

(b) Show that if two infinite decimals represent the same 
real number then either they are identical or the real 
number they represent is a (finite) decimal fraction 
and one terminates with all 0’s and the other 
terminates with all 9’s, e.g. 3.6000... = 3.5999... 


9.2 | Real Functions of Real Variables 


Real Functions of Real Variables 


* Here R denotes the set of real 
numbers. 


381 


9.2 


The arithmetic operations of addition and multiplication 
can be regarded as functions R? — R, that is, as rules by 
which an element of R is determined (the sum or product) 
when an element of R? (a pair of real numbers) is given.* 
This section is devoted to various types of functions 
R” — R” which arise in practice. 

The simplest functions are those which can be described 
by formulas. Of these the simplest are those which can be 
described by formulas which involve only addition and 
multiplication, that is, by polynomial formulas. For 
example the formula w = xy + yz + zx describes a 
function R — R which assigns numbers (w) to triples 
of numbers (x, y, z). Similarly 


X = uv 
y=ur?tv? 


is a function R? — R° (see §5.1, for a discussion of this 
particular function, its singularities, etc.). More generally, 
a polynomial function f: R” — R™ is one which can be 
described by m formulas 


(1) yi = filX1, X2... , Xn) (i= 1,2,...,m) 


in which the right-hand sides are polynomials in 
(X1,X2,...,Xn), that is, sums of multiples of products 
of the x;’S 


(2) fii Xa, 06-5 Xn) = DIAXT x? o XR” 


where >. indicates that there is a sum of a finite number 
of terms of the form Ax?'x$?...xh”". Here the p,’s are 
integers >0 so that x7?!x$?... xh" is simply the result of 
several multiplications. The numbers A are called the 
coefficients of the polynomial, and a polynomial function 
f: R” — R” is said to have integer coefficients if the A’s 
which occur in the formulas (1), (2) describing it are 
integers. A polynomial function f: R” — R” is said to 
have rational coefficients if this is true of the formulas 
which define it. 

An algebraic function is one which is defined implicitly 
by polynomial functions, that is, a function g,(4, ... 5 Yrs 
Xr41s -<-s Xn) Or hyi,..., yr) obtained by applying 
the Implicit Function Theorem to a polynomial function 
Yi = filxX1, Xo, ..., Xn) ata point P; = fi(X 1, Xe, ...5 Xn) 
which is not a singularity. An algebraic function is by its 
very nature defined locally near a specified point. 


Chapter9 | Further Study of Limits 


382 


The most familiar example of an algebraic function is 
y = yx, which is defined locally near x = 1 by saying 
that it is the inverse of x = y? near (X, J) = (1, 1). Near 
(x, 7) = (1, —1) the inverse of x = y? is the algebraic 
function y = — vx. Similarly, near (x, y) = (a, a) the 
inverse of x = y? is y = ++/x with the sign chosen to 
agree with the sign of a; if a = 0 the given function 
x = y? has a singularity at the given point and there is no 
inverse function. It was proved in §7.5 that the function 
\/x is in fact defined for all positive x; this is a special 
global property of this particular algebraic function. 

A very important example of an algebraic function is 
the function obtained by inverting the relation 


X = u“ — V 
y = 2uv. 


(Writing z = x + iy and w = u + iv this relation is 
z = w°, so the problem of inverting the relation is the 
problem of finding the ‘complex square root.) This 
mapping (u, v) — (x, y) is non-singular of rank 2 except 
at (u,v) = (0,0). Hence near any point x = R? — 3°, 
y = 2u0 it defines (u, v) as functions of (x, y) provided 
(i, 0) = (0, 0). The important point which this example 
illustrates is that it is not possible to extend this algebraic 
function unambiguously to all values of (x, y) even if 
singularities are avoided. (This was possible for y = \/x, 
which extended to all values of x which could be reached 
from X = 1 without crossing the singularity at x = 0.) 
This can be seen from a geometrical examination of the 
given relation, which is left to the reader (Exercise 4); 
suffice it to say that each point (x, y) = (0, 0) is the image 
of exactly two points (u, v), (—u, —v) and that there is 
no way to choose one of these two for all (x, y) = (0, 0) 
at once without having discontinuities. This shows that 
the ‘localness’ of algebraic functions is essential and can- 
not be avoided merely by avoiding singularities. 

(More generally, if f(w) is a polynomial function of one 
complex variable w = u + iv then x+ iy = z = 
f(w) = f(u + iv) gives a map (u, v) > (x, y) of R? — R?. 
This map (u,v) — (x, y) is easily shown to be non- 
singular of rank 2 except at the finite number of points 
(u,v) where the derivative f’(w) = f’(u + iv) is zero, 
e.g. where 2w is zero in the example f(w) = w? above. 
Thus, locally near any point (u, v), except a finite number, 
the relation z = f(w) can be solved to give (u,v) as 


9.2 | Real Functions of Real Variables 


383 


algebraic functions of (x, y). However, for each value of 
(x, y) there are n possible values for (u, v), where n is the 
degree of the polynomial f. In extending one of the local 
functions (x, y) > (u,v) one is led to the notions of 
analytic continuation, Riemann surface, and n-sheeted 
covering of the xy-plane which are fundamental to the 
more profound non-local study of algebraic functions.) 

An important sub-class of the class of algebraic func- 
tions is the class of rational functions. A rational function 
is a function which can be expressed by a formula which 
is a quotient of polynomials, e.g. 


2x 
f(x, y) = ar ` 
Such a function is defined at all points where the de- 
nominator is not zero. Rational functions arise in apply- 
ing the Implicit Function Theorem to a polynomial in 
several variables which contains one variable to the first 
degree only. For example, the solution of 


y?z — x?z + Ixy = u 


for z as a function of (u, x, y) is 


which for u = 0 is the function above. The zeros of the 
denominator are the singularities of the map, where such 
a solution z = g(u, x, y) is impossible. 

A function f: R” — R” which is not algebraic is said 
to be transcendental. The most important transcendental 
functions are the functions R — R given by the formulas 
x = æ, x = cost, x = sin t. These functions can be 
defined in a variety of ways, for example as the solutions 
of differential equations: Picard’s method (§7.4) shows 
that the differential equation 


dx 
J T * x(0) = 1 


has a unique solution x = x(t) defined for all values of t 
and that this function can be expanded as a power series 


Pr Pf 
M)=ltttaytato. 


Chapter9 | Further Study of Limits 


384 


This function is by definition x(t) = eœ. Similarly, the 
differential equation 


dx 

a 7% 05] 
dy _ _ 
+i y(0) = 0 


has a unique solution (x, y) = (x(t), y(t)) defined for all 
t and these solutions can be expanded as power series 


4 6 


| t 
Wl star at 
e É rf 
xd =t- at 


These functions are by definition (x(®, y(t) = 
(cos f, sin f). 

A function which can be expressed explicitly or im- 
plicitly in terms of algebraic operations and the functions 
eœ, cost, sint is called an elementary function. The 


elementary functions include 


algebraic functions 

log x (the inverse of x = e’ defined for x > 0) 

tan x (=sin x/cos x) 

sinh x (=4(e” — e™”)) 

Arcsin x (the inverse of x = sin ¢ at (x, t) = (0, 0) 
defined for —1 < x < 1) 

x” (= e" 82 defined for x > 0 and all y) 
etc. 


and all functions R” — R” which can be expressed in 
terms of such functions. 

Functions other than elementary functions which are 
frequently used include: Bessel functions and elliptic 
functions which are defined by differential equations which 
arise in mathematical physics; the I'-function, which is 
discussed in Exercise 10 of §9.6; the ¢-function, which is 
important in the application of calculus to number theory 
(analytic number theory) and which is defined as the 
sum of the series 


l l l 
t(x) = l+ tatio tat 
for x > 1; and hypergeometric functions, which can be 


defined by differential equations, and which are also used 
in physics. 


9.2 | Real Functions of Real Variables 


*See Exercise 4, §9.6. Actually the 
first such example was discovered by 
Bolzano, but the Weierstrass example 

is more famous. 


385 


Of course the notion of a function R” — R” is a very 
general one and any rule assigning points in R” to (some 
or all) points of R” is included. An interesting example is 
the function 


(2) f(x) = | me dy. 


For each fixed x the integrand on the right is a continuous 
function of y for all y except y = 0, where it is not de- 


d . Loo. 
fined; however, the fact that — sin t = cos ¢ implies that 


sin x . sinh 
lim ? x lim —— = xcos0 = x, hence if the 
y—0 y h>0 A 


integrand is given the value x at 0 it is a continuous 
function for all y. As |y| becomes large, the value of the 


. . l l . 
integrand oscillates between — and — — , from which 


y! yl 
it is easily shown (an alternating series converges if its 


sin xy 


terms decrease to zero) that lim [*x dy exists. 
K= 0 


For each fixed x, then, the formula (2) defines a number 
f(x). What is interesting about the function (2) is that it 
can be described in a very different way as 


m if x>0 
(3) f(x) = 0 if x=0 
—r if x <0. 


The fact that (2) and (3) define the same function is by 
no means obvious; it is proved in §9.6. Historically, this 
example and others like it were influential in the develop- 
ment of the modern concept of an arbitrary function 
which is not assumed to be expressible explicitly or 
implicitly by a single formula such as (2); generally 
speaking, even very ‘artificial’ functions such as (3) can 
often be expressed by formulas such as (2) but the 
description (3) is more manageable in practice than the 
formula. 

The concept of an ‘arbitrary’ function is, however, far 
too broad. This was dramatically demonstrated by an 
example of Weierstrass* of a function f: R — R which 
is defined and continuous for all x but not differentiable 
for any x (i.e. lim f(x + h) = f(x) but 


h-0 
im E+- SQ) 
h—=0 h 


does not exist for any x). 


Chapter9 | Further Study of Limits 


Exercises 


386 


The theorems of calculus—for example the Implicit 
Function Theorem or Stokes’ Theorem—do not apply to 
such ‘arbitrary’ functions, and in fact cannot even be 
stated without the assumption that the functions in- 
volved are at least differentiable. It is necessary, therefore, 
to develop a vocabulary of conditions, such as continuity, 
differentiability, etc., in order to formulate the theorems 
of calculus. Such conditions are discussed in the following 
section. 


1 What functions R” — R” can be expressed in terms of 
addition and multiplication alone? 


2 Once the function e” has been defined by its differential 
equation, the function y = log x can be defined as the inverse 
of x = e” (x > 0). Show that the function so defined satisfies 


rex = | = dt (x > 0) 
1 
and 

log x = lim n(Wx — 1) (x > 0). 


NO 


Either of these relations can be used as an alternative definition 
of log x for x > 0. 


3 Once the functions (cos x, sin x) have been defined by 
their differential equation, the function y = Arcsin x can be 


defined as the solution of x = siny, -—1 < x < 1, -5 < 
T . 
y < 3° Write Arcsin x as a definite integral. Similarly, give a 


domain of definition of Arctan x and write it as a definite 
integral. 


4 To prove that the function x = u? — v?, y = 2uv has the 
property stated in the text—i.e. that it cannot be inverted 
globally for (x, y) ¥ (0, 0)—prove that the integral of d0 = 
(x dy — y dx)/(x? + y?) over any differentiable closed curve 
is a multiple of 27. Show that the pullback of dð under 
(u, v) — (x, y) is 2 d0. Conclude that if there were an inverse 
(x, y) — (u, v) then the pullback of d@ would have to be $ d0 
and there would be a curve over which the integral of dO was r. 


5 Let a2, a1, ao be real numbers. Set z = x + iy, w = 
u + iv and express the map z = aw? + aıw + ao as an 
explicit function R? — R? in terms of the coordinates (u, v), 
(x, y) and the constants a2, a1, ao. Show that the singularities 


9.3 | Uniform Continuity and Differentiability 387 


Uniform Continuity and 
Differentiability 


of this map occur at the points (u, v) such that 2aow + a, = 0. 
Prove that the same is true if a, = bn + ic, (n = 0,1, 2). 
Using the arithmetic of complex numbers, prove that the 


singularities of z = anw” + an_iw"-!+-:++-+ ao are at 
the points w where na,w"—! + (n — Lan_iw®-?2 +- + 
a, = 0. 


6 Let /f: R— R be defined by the formula 


ora) K 
—}2 . —+2 
f(x) -J e cos xt dt = im | e° cos xt dt. 
—K 


— K=% 


Sketch the integrand for various values of x and draw what- 
ever conclusions you can about the function f(x). An explicit 
formula for f(x) is derived in §9.6. [It is not difficult to see for 
which values of x this function f(x) is defined, for which x it 
is a maximum, a symmetry of the function, and its approxi- 
mate value for very large values of x.] 


9.3 


The condition of ‘uniform continuity’ of a function on a 
set occurs very naturally in many contexts, for example 
in the definition of the transcendental function 10”: For 
natural numbers a, the product of 10 with itself a times 
is denoted 10°. For rational numbers q, say for the solu- 
tion q of aq + b = c where a, b, c are natural numbers, 
the number 102 is defined by the equation (109)! - 10° = 
10¢+ = 10°, that is, 10% is the positive number y such 
that y* = 10°/10°. As was shown in §7.5, this defines y 
uniquely. If (a@’, b’, c’) is another description of q then the 
solution of 10°y* = 10° also solves 10° y7 = 10°, hence 
107 is well-defined. The extension of 10” to irrational 
values of x is ‘by continuity’ which means that for a 
sequence of rational numbers x, converging to x, one 
defines 10” to be lim 10%. It is this ‘extension by con- 
tinuity’ which requires the notion of uniform continuity. 

Intuitively, the definition 107" = lim 10” means 
that, for example, 10” is approximated by 107" where x, 
is a rational number approximating m (e.g. by the 
seventh root of 1077) and that any number of decimal 
places of 107 can be found by taking x, to be sufficiently 
near to r. As a practical matter, the actual computation 
of 10°", where x, is a rational number with a large 


denominator, involves sophisticated numerical tech- 


niques, logarithms, etc. What is at issue, however, is not 


Chapter9 | Further Study of Limits 


388 


the practicality of finding 10” but the question of whether 
107 = lim 10% is a valid definition of 10". In order to 
prove that it is, one must show that if lim x, = r then 
10% is a convergent sequence, and that if lim x, = r 
then lim 10% = lim 10%. This essentially reduces to the 
question “If x, x’ are rational numbers such that |x — x’| 
is small, how large can |10” — 10*| be?” The estimates 
needed to define 10” are provided by the following 
observations: Let x, x’ be rational numbers in the 
interval {3 < x < 4} and let 6 = |x — x’|. Suppose 
x’ > x so that x’ = x + ô. Then 0 < 107 — 107% = 
107(10° — 1) < 104(10° — 1). Given any margin for 
error eé, choose a natural number N so large that 
101 — 1 < e: 1074, ie. 10 < (1 + e: 107$”. For 
example, since (1 + e: 1075A > 14+ N-e: 10-4, it 
suffices to take N > 9: 104 - e7}. Then |x’ — x| < 1/N 
implies |107 — 107) < 104(10° — 1) < 104. (101^ — 
1) < e, hence 10” is determined to within e when x is 
determined to within 1/N. Thus to determine 10” to 
within e it suffices to find 10” for x a rational number such 
that |x — r| < 1/2N. Any other 10” found by the same 
method will differ by less than e from this one. Thus 10” 
can be found to any prescribed degree of accuracy and 
10” is therefore a well-defined real number. 

Letting X denote the set of rational numbers in the 
interval {3 < x < 4}, these observations are a special 
case of the following theorem: 


Theorem 


Let f: R” — R” be a function which is defined at all 
points of a subset X of R”. The function f is said to be 
uniformly continuous on X if it is true that for every 
e€ > 0 there is a 5 > 0 such that |f(x) — f(x’)| < € 
whenever x, x’ are points of X satisfying |x — x'| < ô 
(where, as usual, |x — x’| < 6 means that each of the n 
coordinates of x differs by less than 6 from the corre- 
sponding coordinate of x’ and similarly for |f(x) — 
f(x')| < €). The closure of X, denoted X, is defined to be 
the set of all points of R” which can be written as limits 
of sequences of points of X. If f is uniformly continuous 
on X then the formula f(lim x,) = lim f(x,) defines a 
function on the closure X of X which is uniformly 
continuous on X and which is the only uniformly con- 
tinuous function on X which agrees with f on X. 


9.3 | Uniform Continuity and Differentiability 389 


All that is required to prove this theorem is a simple 
unravelling of the definitions of ‘uniform continuity’, 
‘limit’, and ‘real number’, which will be omitted. 

The definitions of the arithmetic operations +, —, 
X, + for real numbers are special cases of this theorem, 
based on the fact that these operations are uniformly 
continuous functions on bounded intervals of rational 
numbers (excluding intervals which contain 0 in the case 
of +). In the same way, any polynomial function 
f:R” — R” is uniformly continuous on any bounded 
subset X of R”. 

Loosely speaking, to say that a function is uniformly 
continuous on a set X means that the value is determined 
to an arbitrary degree of accuracy once the argument is 
determined with sufficient accuracy. From this point of 
view it is clear that any function which describes the 
result of a physical measurement must be uniformly 
continuous unless its value is in some sense indeterminate. 

The notion of uniform continuity gives a very simple 
definition of the notion of ‘differentiable’, namely the 
following: 


Definition 


Let X be a subset of R” and let f : R” — R” bea function 
which is defined at all points of R” which are within 6 of 
X; that is, a number 6 > Ois given such that the domain 
of f includes all points y of R” for which there is a point 
x of X satisfying |x — y| < 6. Then 


(1) Msh) = EEH = SO) 


is defined for all x in X, h an n-tuple satisfying |h| < 1, 
and s a real number satisfying |s| < ô, s = 0. The func- 
tion f is said to be uniformly differentiable on X if M,,z(h) 
is uniformly continuous on this set of (s, x, h) where it is 
defined. When this is the case M, ,(h) has (by the theorem 
above) a uniformly continuous extension to a set which 
includes the set {s = 0, xin X, |h| < 1}. The function so 
defined is called the derivative of f at x evaluated on A. 


It is of the special form (Ay, ho, .. , hn) — (È aı;hj, 
>) az;hj,..., >, am;h;) where the a;; are uniformly 
j=l j=l 


continuous functions of (x1, X2,..., Xn). This formula is 
abbreviated by naming the coordinates on R™, say 


Chapter9 | Further Study of Limits 


390 


Vis Y2, <- <, Ym, and by writing it 


OV; 
where = denotes dij. In the absence of names for the 


Xj 

coordinates on R” the functions a;; are also denoted 
wi 
1) T ° 
OX; 


Intuitively, to say that f is uniformly differentiable 
means that the value of (1), which can be imagined as 
‘f at x under a microscope of power 1:s’, can be de- 
termined to within any prescribed e by making s suf- 
ficiently small and by determining x, h with sufficient 
accuracy (see §5.3). 

If fis a polynomial function and X 1s a bounded subset 
of R” then fis uniformly differentiable on X for the simple 
reason that the function (1) is again a polynomial func- 
tion (the terms in the numerator which contain no s 
cancel out so that the s in the denominator can be can- 
celled). The proof of the Implicit Function Theorem can 
be strengthened to show that whenever the given function 
f is uniformly differentiable, the solution functions g, h 
are also. Hence all algebraic functions are (locally) 
uniformly differentiable. Similarly, since the functions 
e”, sin x, cos x are (locally) uniformly differentiable (see 
Exercises 7 and 8) it follows that all elementary functions 
are (locally) uniformly differentiable. 

As the terminology suggests, the notions of uniform 
continuity and uniform differentiability of a function on 
a set X are stronger forms of the notions of continuity 
and differentiability defined in the earlier chapters. For 
purposes of comparison these definitions are: 

Let f: R” — R” be a function defined at all points of a 
subset X of R”. The function f is said to be continuous on 
X if for every e > 0 and every x in X there is a 6 > O 
such that | f(x) — f(x’)| < e for all points x’ of X 
satisfying |x — x’| < ô. (The difference from uniform 
continuity is that 6 may depend on the given x as well as 
on e.) The function f is said to be differentiable on X if 
for every x = (X1, X2, ..., Xn) in X 


fixi... Xp F S, on Xn) — fil, ~~~ Xn) 


9.3 | Uniform Continuity and Differentiability 391 


Exercises 


is defined for all sufficiently small s = 0, and if the limit 
as s — 0 exists and is a continuous function on X for all 
mn values i = 1,2,...,m3j = 1,2,...,H. 


1 Prove directly that a polynomial function f: R” — R is 
uniformly continuous on the set {|x| < K} for any fixed K. 
[First show it is bounded, i.e. | f(x)| < K’ for some K’. Then 
consider f(x + h) — f(x).] 


2 Prove directly that a polynomial function f: R” — R is 
uniformly differentiable on the set {|x| < K} for any fixed K. 


3 Give an explicit upper bound for the amount by which 


n “\ 3 
4 > (2) J differs from 1. [Do not use the formula for 
j=l n 


>~j°.] More generally, give an explicit upper bound for the 
amount by which an approximating sum 


Dita) = 4 D t Ax; 


to f} 4x3 dx differs from 1 [where a denotes a subdivision of 
the interval {0 < x < 1} into small subintervals indexed by 
j, together with a choice of a point x; in the jth interval, and 
where Ax; is the length of the jth interval]. Conclude that 
Jù 4x3 dx converges and is equal to 1. 


4 Generalizing 3, show that if f(x) is uniformly differentiable 
on {0 < x < 1} then f} f'(x)dx converges and is equal to 
f) — f(); that is, prove directly that the Fundamental 
Theorem of Calculus holds for uniformly differentiable 
functions. 


5 Prove that f(x) = x7} is uniformly continuous (and 
uniformly differentiable) on any set {x > a} for a > 0, but 
not fora = 0. 


6 Prove that e” (defined by its differential equation) is 
uniformly differentiable on any set {|x| < K}. [Use the 
formula for e**".] 


7 Prove that sin x and cos x are uniformly differentiable on 
the entire line {— œ% < x < œ}. [Similar to 6.] 


8 Prove that the function e7!°¢!° is identical with the 
function 107 defined in the text, and hence that 107 is uni- 
formly differentiable on any interval {|x| < K}, with deriva- 
tive log 10 - 107. 


9 Prove that the function f(x) = (sin x)/x is uniformly 
differentiable on the entire line. [Rewrite f(x) as få cos xt dt. 
This extends f to x = 0.] 


10 Prove that the function f(x) = |x|? is uniformly con- 
tinuous on any interval {|x| < K} for p > 0. 


Chapter 9 | Further Study of Limits 


Compactness 


392 


11 Let f(x) be the function which is x? sin (5) when x = 0 
x 


and which is 0 when x = 0. Show that lim EO 
h—0 


exists for all x but that f is not differentiable. 


12 Prove directly that if f(x) is uniformly differentiable on 
an interval {a < x < b} and if there is a 6 > O such that 
f'(x) > ôat all points x in the interval, then f does not assume 
a maximum value inside the interval. 


13 Prove directly that if f(x, y) is uniformly differentiable 
on the square {|x| < 1, |y| < 1} and if there is a ô > 0 such 


Fli or 
x 


> 6 then f does not assume a maximum value inside the 


that at all points (x, y) of the square either 
of 


oy 
square. 


14 Prove directly that if A(x1, x2,...,Xn) is uniformly 
continuous on an n-dimensional rectangle R in R” then the 
integral fr Adx1dx2...dxn, converges. [Using uniform 
continuity the statement U(S)— 0 as |S|—0 of §2.3 is 
easily proved.] 


15 Let X be the subset of R which consists of all numbers of 
the form 1/n where n is a positive integer. Prove that a func- 
tion f:R— R defined at all points of X is uniformly con- 
tinuous on X if and only if the sequence f(1/n) is convergent. 


16 Differentiation under the integral sign. Prove that if 
f(x, y)is uniformly differentiable on {a < x < b,c < y < d} 
then the function F(y) = f?f(x,y)dx is uniformly dif- 
ferentiable on {c < y < d} and its derivative is F’(y) = 


| 7 (x, y) dx. [This is a very easy estimate.] 
a OY 


9.4 


A subset X of R” is said to be compact if it is closed and 
bounded, that is, if every point which can be written as a 
limit of points of X is itself in X (closed) and if there is a 
real number K such that all coordinates of all points of 
X are less than K in absolute value (bounded). 
Typically a compact set is a bounded set in R” defined 
by a finite number of equations and inequalities involving 
>, for example the disk x? + y? < 2, the sphere 
x? + y? + z? = 1, the cube {|x| < 1, |y| < 1, |z| < 1}, 


9.4 | Compactness 393 


*Here and in the remainder of this 
section |x — x'| denotes, for x, x’ in 
AR^, the maximum absolute value of 

the difference of corresponding 
coordinates, i.e. if X = (X1, X2,..., 


Xn) and x’ = (x7, Xo, ...+. Xn) then 

Ix — x'| = max(|x1 — x4), x2 — x9], 
1 

oes |Xn — Xnl). 


A continuous function on a compact 
set assumes a maximum value. 


A continuous function on a compact 
set is uniformly continuous. 


A subset X of R" is compact if and 
only if every infinite sequence of 
points of X has a point of 
accumulation in xX. 


Every (interior) covering of a 
compact set has a finite subcover. 


etc. Typically a non-compact set is a set which is not 
bounded (e.g. x? + y? > 2) or a set which is defined 
by inequalities involving >, for example x? + y? < 2 
or {|x| < 1, |y| < 1, |z| < 1}. Such a set is not compact 
(in general) because a limit of points where an inequality 
> is satisfied may not satisfy the inequality ; for example, 
a point on the circle x? + y? = 2 can be written as a 
limit of points inside the disk x? + y? < 2. Of course 
when one speaks of subsets of R” one has in mind very 
general sorts of sets, of which these subsets defined by 
equalities and inequalities are very simple special cases. 

The usefulness of the notion of compactness is illus- 
trated by the following four theorems. 


Theorem 1 


Let X be a compact subset of R” and let f: R” — R be a 
function which is defined and continuous at all points of 
X (for every x in X and for every e > 0 there isa 6 > 0 
such that | f(x) — f(x’)| < e whenever x’ is a point of X 
such that* |x — x’| < ô). Then there is a point x of X 
such that f(x) > f(x’) for all x’ in X. 


Theorem 2 


Let X be a compact subset of R” and let f : R” — R” be 
a function which is defined and continuous at all points 
of X (for every x in X and for every e > 0 there is a 
ô > 0 such that* | f(x) — f(x’)| < e whenever x’ is a 
point of X such that |x — x'| < ô). Then f is uniformly 
continuous on X (for every e > 0 there is a 6 > 0 such 
that | f(x) — f(x’)| < e whenever x, x’ are points of X 
such that |x — x’| < ô). 


Bolzano-Weierstrass Theorem 


If X is a compact subset of R” and if x°?, x‘”, x)... 


is an infinite sequence of points of X then there is a 
point x“ of X with the property that for every e > Oan 
infinite number of points x” of the sequence satisfy 
jx — x] < e. Conversely, if X is not compact then 
there is an infinite sequence x‘, x‘, x‘... in X for 
which there is no such point x in X. 


Heine-Borel Theorem 


Let X be a compact subset of R” and let {U,} be an 
infinite collection of subsets of R” which ‘cover’ X in the 
sense that for every x in X there is a member U, of the 


Chapter9 | Further Study of Limits 


394 


collection which contains x in its interior; that is, there is 
a ô > Osuch that all points x’ of X satisfying |x’ — x| < ô 
lie in U,. Then it is possible to select a finite number of 
the sets U, in such a way that they ‘cover’ X in the same 
way. 


The similarity of these theorems is less evident in their 
statements than it is in their proofs. In all four proofs it 
is useful to consider the subdivision of R” by the planes 


x; = (i place decimal fraction) (Gj = 1,2,...,n”) 


into cubes 10~* on a side. If X is compact then for each i 
only a finite number of these cubes contain points of X 
(because X is bounded). In all four proofs one uses this 
observation to choose a nested sequence of cubes Cy, i.e. 
Co 2 C12 C22 C32-+:, such that C; is 107 on a 
side and such that the points common to X and C; have 
some desired property. By the completeness of the real 
number system, the sequence of centers of the cubes C; 
has a limit, say x°, which can also be described as the 
unique point of R” which is contained in all cubes C; of 
the nested sequence. For any ô> 0 the cube {|x — 
x! < 8} contains all the cubes C; for i sufficiently 
large; in particular, any such cube contains a point of X. 
Hence x‘”’ can be written as a limit of points of X and 
therefore x is in X (because X is closed). The point 


x is then shown to have the desired property. 


Proof of the 
Bolzano-Weierstrass Theorem 


At least one of the cubes of side 1 in the subdivision of 
R” by planes {x; = integer} must contain an infinite 
number of points of the given sequence x‘, x”, x, 
... . Select such a cube and call it Co. At least one of the 
10” cubes into which Co is divided by the planes 
{x; = integer/10} must contain an infinite number of 
points of the given sequence. Select such a cube and call 
it C,. Continuing in this manner gives a nested sequence 
Co 2 C1 2 Co D- -+ of cubes such that C; has side 107° 
and contains an infinite number of points of the given 
sequence. Let x‘~ be the unique point common to all the 
cubes C;. Then x‘” is in X; and, given e > 0, the cube 
{lx — x] < e} contains all the C; beyond a certain 
point (take i so large that 107° < e€) hence the cube 
{|x — x | < e} contains an infinite number of the 
points x” as desired. Conversely, if X is not compact 


9.4 | Compactness 


395 


then either X is not bounded, in which case there is a 
sequence x” of points in X such that |x'”| > j or X is 
not closed, in which case there is a sequence x” of points 
of X converging to a point not in X, hence in either case 
there is a sequence x” of points in X for which there is 
no x in X. 


Proof of the 
Heine-Borel Theorem 


If X cannot be covered by a finite number of the sets U, 
then some cube Co of side 1 must have the property that 
the points of X which lie in Co cannot be covered by a 
finite number of the Ua. Subdividing Co into 10” parts 
gives a cube C, of side 10~! with the same property. 
Continuing this process ad infinitum gives a nested 
sequence of cubes with this property. Let x‘ be the 
point common to all the cubes C;. By assumption there 
is a Ua which contains a cube of the form {|x — x | < 
ôy, hence a Ux containing all of C; for some i. This shows 
that the points of X in C; can be covered by a single set 
U., contrary to assumption, and this contradiction shows 
that a finite number of the sets U, must suffice to cover 
all of X. 


Proof of Theorem 2 


If f is not uniformly continuous then there is an e for 
which no 6 suffices; that is, there is an e for which it is 
possible to find x”, * in X such that |x — %%| < 
1077 but such that |f (x?) — f(x)| > e. One can then 
choose a nested sequence Co 2 Cı DC_g>D°°° such 
that each C; contains an infinite number of the points 
x selected above. Let x” be the unique point common 
to the C;. Then given 6 > 0 there is a j such that the 
cube {|x — x| < 6} contains both x”, *%. Hence 
either f(x) — f(x®)| > €/2 or |f) — SEP) = 
e/2, which shows that f is not continuous at x°. There- 
fore if f is continuous at all points of X it must be 
uniformly continuous. 


Proof of Theorem 1 


By Theorem 2 there is an i such that | f(x) — f(x’)| < 1 
whenever |x — x’| < 107*. Divide R” into cubes of side 
10~* and evaluate f at one point of X in each cube which 
contains a point of X. This gives a finite number of values 
of f. Let K be the largest of these values. Then f assumes 
no value greater than K + 1 on X, 1.e. f is bounded on 


Chapter 9 | Further Study of Limits 


396 


X. For each j = 1, 2, 3,..., let q; be the largest j place 
decimal fraction such that f assumes values greater than 
or equal to q; on X and let x” be a point of X such that 
f(x”) > qj. The sequence q; is contained in the interval 
{qn <x < qu + 107} for every N, hence converges 
to a real number, say r. Since f assumes no value on X 
greater than r (because such a value would be greater 
than q; + 10°’ for some j) it suffices to show that f 
assumes the value r. Let Cp DC; DCyD°°: be a 
nested sequence of cubes such that C; has side 10™ and 
contains an infinite number of the points x”, and let 
x be the point common to all C;. On any cube 
{ix — x®| < 8} the function f assumes values (at the 
x) arbitrarily near r which, by the continuity of f, 
implies that f(x‘) = r, as was to be shown. 


The proofs of many theorems about differentiable 
functions can be simplified using the following theorem: 


Theorem 3 


A (continuously) differentiable function on a compact 
set is uniformly differentiable. More precisely, a function 
f: R” — R” which is defined and continuously dif- 
ferentiable at all points within 6 of a compact subset X of 
R” (for some ô > 0) is uniformly differentiable on X. 


Proof 


By assumption the function 


f(x + sh) — f(x) 


M,, c(h) = 5 


is defined for x in X, s a real number 0 < |s| < 6, and 
h an n-tuple |A| < 1. On the basis of the assumption that 


0 
the n functions ff : R” — R” defined by 


a 


IF (x) = lim M, (0, 0,..., 9, 1,0,...,0) 
Ox; s—0 


(1 in the ith place) 


exist and are continuous at all points x within 6 of X, it 
is to be shown that M, (A) is uniformly continuous on the 
set {0 < |s| < ô, x in X, |A| < 1}. The main step in the 
proof is, as in §5.3, to use the Fundamental Theorem of 
Calculus to write M, z(h) as a sum of integrals of partial 


9.4 | Compactness 397 


derivatives of f over n line segments parallel to the 
coordinate axes 


h: 
n 2 ð 
(1) M,2(h) = >, J of (xy + shy... ., Xi—1 + Shi—1, Xi + SY, Xia, - - - Xn) dy. 
i=l a 


To say that M, .(A) is uniformly continuous on the set 
{0 < |s| < ô, x in X, |A| < 1} means that |M, (hA) — 
My «(W')| can be made arbitrarily small (in all m co- 
ordinates) for (s, x, h) and (s’, x’, h’) in this set by making 
Is — s’|, |x — x’|, and |h — h’| sufficiently small. As 
usual, one makes the change from (s, x1, X9,..., Xn, 
hy,ho,...,Mn) to (s, xi, X2... Xg hi, hg, ..., hy) 
one coordinate at a time and shows that for each such 
change the change in the value of M,.,(h) is small; then 
by the triangle inequality the total change from M, ,(h) 
to Ms z(h) is small. 

Specifically, if s is changed to s’ then the integrands in 


ð 
(1) are changed from the values of a at (xı + shy, 
X; 


..., Xn) to their values at another point (x; + s’hy,..., 
Xn) which is a distance of at most |(s — s’)h,;| < |s — s’| 


away in any coordinate direction. The functions s are 
continuous on the set of all points within 6 of X and this 
set 1s compact (see Exercise 3) hence by Theorem 2 the 
change in the integrands can be made uniformly small by 
making |s — s’| small. Since the change in the integrals 
is at most the change in the integrands times the length of 
the intervals of integration it follows that the change in 
(1) resulting from changing s to s’ can be made arbitrarily 
small by making |s — s’| sufficiently small. By the same 
argument, the change in (1) resulting from changing x; 
to x; can be made arbitrarily small by making |x; — x; 
sufficiently small. Finally, if h; is changed to h; the first 
i — | integrals in (1) are not changed at all, and the last 
n—i-— 1 are changed only slightly (by the same 
argument). In the ith integral, the upper limit of integra- 
tion is changed from h; to h; which changes its value by 
at most |h; — h;| times the largest value of the integrand 


ð 
2 ; since the integrand is bounded (Theorem 1) this 
change can be made arbitrarily small by making |h; — h;| 
small. This completes the proof of the theorem. 


The use of the word ‘compact’ in the phrase ‘compact, 


Chapter9 | Further Study of Limits 


Exercises 


398 


oriented, differentiable, k-dimensional manifold-with- 
boundary’ in Chapter 6 is easily seen to agree with its 
definition in this section. On the one hand, any set S in 
R” which can be described by a finite number of charts as 
in Chapter 6 is easily seen to be compact (use the 
Bolzano-Weierstrass Theorem). On the other hand, any 
k-dimensional differentiable manifold in R” (see §5.5) 
which is compact can be described by a finite number of 
charts (the Heine-Borel Theorem). If a k-dimensional 
‘manifold-with-boundary’ is defined to be a set which can 
be parameterized locally by a k-dimensional rectangle 
and if an ‘oriented, differentiable k-dimensional manifold’ 
is defined to be a manifold for which a non-zero k-form 
is specified, then the separate meanings of all the terms 
in the phrase ‘compact, oriented, differentiable, k-dimen- 
sional manifold-with-boundary’ become clear. 

The proofs of this section resemble very strongly the 
proof of the Fundamental Theorem of Calculus (§3.1) 
and the proof of the theorem which defines fs w (§6.3) in 
that all these proofs involve a nested set of cubes or 
rectangles and an argument to a contradiction based on 
the fact that there is a limit point x” contained in them 
all. In fact, Theorems 2 and 3 can be used in the proofs of 
the Fundamental Theorem and of the definition of fs w 
to conclude that all continuous functions are uniformly 
continuous and all differentiable functions are uniformly 
differentiable on the compact domain of integration, 
after which these theorems are easily proved (see Exer- 
cises 5 and 15 of §9.3) without recourse to the subdivision 
arguments given in Chapter 3 and Chapter 6. 


1 Let CoD C1 Cə22--- bea nested set of cubes in R”, 
and let C; have side 10~*. On the basis of the completeness of 
the real number system, show that there is exactly one point 
x‘) contained in all the C;. 


2 Using the Heine-Borel Theorem, show that given a nested 
set of cubes Co D C1 D Co D:-:-, there is at least one point 
which lies in them all. [Use the compactness of Co.] 


3 Prove that if X is a compact set in R” and if X; is the set 
of all points which lie within 6 of X, then X; is compact. 


9.5 | OtherTypes of Limits 399 


Other Types of Limits 


*Here and in the remainder of this 
section |x — x’| denotes, for x, x’ in 
R”, the maximum absolute value of 

the difference of corresponding 
coordinates, i.e. if X = (X}, X2,..., 


Xn) and x’ = (xj, X2, ..., Xn) then 
[x — x| = max(|x1 — xil, |x2 — x2], 
ce Xn — Xnl). 


4 Show that if f :R” — R” is a continuous function and if 
X is a compact set in R” then its image f(X) = {all points 
of the form f(x) where x is in X} is a compact set in R™. 
[Use Bolzano-Weierstrass. ] 


5 Differentiation under the integral sign. Prove that if f(x, y) 

is differentiable on {a < x < b,c < y < d} then F(y) = 

fe f(x, y) dx is differentiable on {c < y < d} andits derivative 
b 


is 2 (x, y) dx. [See Exercise 16, §9.3.] 
Yy 


a 


9.5 


The limits discussed in the preceding chapters are limits 
of sequences, derivatives-of functions, and integrals of 
forms over compact manifolds. For purposes of com- 
parison the definitions of these limits will be reviewed 
before giving the analogous definitions for infinite series, 
infinite products, and improper integrals: 

A sequence x‘), x‘”’, x‘, ... of points in R” is said 
to be convergent if for every e > O there is a natural 
number N such that* |x‘ — x | < e whenever n, 
m > N. When this is the case, the sequence determines a 
unique point x‘ in R”, called the Jimit of the sequence, 
with the property that for every e > O there is a natural 
number N such that |x‘? — x™®]| < e whenever n > N. 

A function f:R” — R” is said to be uniformly dif- 
ferentiable on a subset X of R” if there is a 6 > O such 
that the function 


M,.(h) = fat sh) — fo) 


is defined and uniformly continuous for all (s, x, h) such 
that s is a real number 0 < |s| < ô, xis an n-tuple in X, 
and h is an n-tuple |h| < 1. When this is the case, the 
formula 


Mo,=(h) 


lim M, ,(h) 
s—0 
~ lim [E t- FO) 


s—0 S 


defines an m-tuple M ọ,z(h) for all x in X and all n-tuples 
h. The function M o,z(h) is called the derivative of f. 


Chapter 9 | Further Study of Limits 


*/n everyday speech the words 
‘series’ and ‘sequence’ are more or 
less synonymous, but in mathe- 
matical terminology they are very 
sharply distinguished. A series Is a 
sum, and a sequence is merely an 
infinite list (of numbers, points, 
functions, etc.). 


400 


The integral f r Á dxı dxə ...dxņp of a k-form over a 
k-dimensional rectangle is said to be convergent if for 
every € > 0 there is a ô> 0 such that |È (a) — 
>(a’)| < e whenever È (a), X} (a') are approximating 
sums to f r Adx,dx2...dx; in which the mesh sizes 
la|, |a’| are less than 6. (See Chapter 2 for the definition 
of $ (a).) When this is the case, there is a unique real 
number, denoted fr A dxı dx>,...dx;, and called the 
integral of A dxı dxə ... dx, over R, with the property 
that for every e > 0 there is a ô> 0 such that 
|fr A dx, dx... dxy — X(a)| < e whenever X (a) is an 
approximating sum to fr A dx, dXə . . . dXp in which the 
mesh size |a| is less than ô. The integral of a continuous 
k-form over a compact, oriented, differentiable, k- 
dimensional manifold-with-boundary was defined in §6.4 
as a sum of a finite number of convergent integrals over 
rectangles. 

The essence of the idea is that of a process for de- 
termining a number; the process is convergent if the num- 
ber which results can be determined to within any margin 
for error e > 0 by carrying the process out with a suf- 
ficient (finite) degree of accuracy. When this is the case 
the process determines a real number which is called its 
limit. Processes other than those described above are 
infinite series, infinite products and improper integrals. 


Infinite Series 


An infinite series* a} + a + a3 + +- in which the a; 
are real numbers is said to be convergent if for every 
e€ > 0 there is a natural number N such that the sum of 
the first n terms differs by less than e from the sum of the 
first m terms whenever n, m > N. Since (a; + dg + 


*** + an) — (a1 + Gg + °°* + am) = ampi + Am2 + 


-++ + an OF —An41 — an2 — ''' — Am depending on 
whether n < n or m > n, this is the same as saying that 
a; + a2 + az + +- is convergent if for every e > 0 


there is an N such that |am41 + am42 + °°* + an| <€ 
whenever n > m > N. When this is the case there is a 
unique real number, called the sum of the series and 


denoted >> a;, with the property that for every e > 0 
1=1 00 n 
there is an N such that È a; — 2 a; | < € whenever 


t= 1 151 


n > N (where 5 a; denotes aj + dg +° + an) 


1=1 


9.5 | OtherTypes of Limits 401 


*/f none of the factors a; are zero 
this is the same as saying that the 
product of the first n factors divided 
by the product of the first m factors 
differs from 7 by less than e. 


{This statement requires proof. See 
Exercise 10. 


tAn integral is improper if the domain 
of integration Is not compact. 


8Here, once again, rectangles are 
used merely because they are the 
most simple compact domains. See 
$9.6 (p. 470) for a discussion of the 
possibility of using more general 
domains than rectangles. 


Infinite Products 


An infinite product a,;d,a3...1n which the a; are real 
numbers is said to be convergent if for every e > 0 there 
is an N such that* |am41am+2 . - -an — 1| < e whenever 
n >m > N. When this is the case there isf a unique 
real number, called the product of the a; and denoted 


i a;, with the property that for every e > 0 there is an 


1=] 


N such that Ila — Ila <e whenever n > N 


i=] 


n 
(where Il a; denotes aidas... an) . 


1=1 


Improper Integralst{ 


The integral [2s f2. A(x, y) dx dy of a 2-form over the 
entire xy-plane is said to converge if the integrals of 
A(x, y) dx dy over finite rectangles’ R converge and if 
for every e > 0 there is a number K > 0 such that 
fr A(x, y) dx dy differs by less than e from fr A(x, y) 
dx dy whenever R, R’ are rectangles which contain the 
square {|x| < K, |y| < K}. When this is the case there 
is a unique real number, denoted f2.. fe. A(x, y) dx dy, 
with the property that for every e > O there is a K such 
that |f [2o A(x, y) dx dy — fr A(x, y) dx dy| < € 
whenever R is a rectangle containing the square 
{|x| < K, ly| < K}. Other improper integrals are de- 
fined analogously. 


Examples and Applications 


The usual tests for convergence of series are easy con- 
sequences of the above definition of convergence. For 


“ey a . 
example the ratio test states that if lim sty exists and 


n—0 dn 


is less than one then the series a; + dg + a34+ °°" 
converges; this is proved by observing that 


|an] < plan] (p < 1) 


for all sufficiently large n hence 


[am+1 + Am+2 +- + anl 
< lam+1] + lam +2| + ses + LA 


<p tan] +- + p” ™ |an] 


= (p"—" — ="): TO 5 [axl S i lay. 


Chapter9 | Further Study of Limits 402 


Since |ay| — 0 this proves that the series converges. This 
test, applied to the series x‘! + (x? — x?) + 
(x'? — x) +--+, was used repeatedly in Chapter 7 
to prove, for example, that the series 


2 3 
efal+xt+5t¢yt: 
3 5 
. X X 
snx = X— a aT 


generated by Picard’s iteration are convergent. 
A series a, + dg + a3 + +> in which all terms are 
non-negative, a; > 0, is either convergent or the ‘partial 
n 


sums’ >) a; can be made arbitrarily large, that is, given 
i=] 


any real number K there is a natural number n such that 
n 


$ a; > K. The proof of this fact is a simple reformula- 

i=l o 

tion of the denial of the statement ‘>_ a; is convergent’. 
i=l 

(Exercise 10, §9.1.) This theorem is often stated as fol- 


lows: A series >> a; in which all terms are positive either 
i=l 


converges to a finite sum or diverges to +0. 
A very interesting infinite product is 


a) sinx=x(1-4) (144) (1-2)(148)- =e (1-35) 


This formula was discovered by Euler, who was led to 
the discovery by the observation that sinx = x — 


3 5 
a + = — -+ is a ‘polynomial of infinite degree’ which 
has roots at x = 0, +r, +27,...,and hence, by 


*For an excellent discussion of analogy with ordinary polynomials, that sin x should be 
Euler's discovery see G. Polya, the product of the factors on the right side of (1). A 
Induction and Analogy in . . © : 
Mathematics. Princeton University Tigorous proof of (1) based on this idea is given in the 
Press, 1954, pp. 17-22. next section.* 


Euler had set out to find the sum of the series 


l 
I+ 7 1+3 a tie a tas z+ + z2 +e 
It was known that this series was convergent 


l l l l l l 
(terat ta mart EDT t @— In 


l l l l 1 tj] 1 1,1 _1 
-[ -ahha ma lail- iiao) 


9.5 | OtherTypes of Limits 403 


*This condition is essential. See 

E. C. Titchmarsh, Theory of Functions, 
Oxford University Press, 1939, p. 17, 
for examples in which TI(7 + bi) 
converges but >°b; diverges and vice 
versa. 


and the sum had been found to several decimal places to 
be 1.644934 ..., but the exact evaluation was a famous 
unsolved problem of the day. By equating the coefficients 
of x? in the formula (1) Euler concluded that 


and hence gave the correct value 


2 
T 


Lt gtgtec tate eh 


This example illustrates the vast difference between the 
statement ‘the series >'a; is convergent’ and the statement 
‘the sum of the series >\a; is such-and-such’. Even Euler 
was unable to find the sum of the convergent series 


Itetatatotate 


(although, using (1), he was able to find 24 for all 


qt 
even values of k, e.g. 1 +5 a taa L. = — see 


90 
Exercise 9, §9.6.) 
The fact that the infinite product (1) converges for all 
x is a special case of the following theorem: If bı, bo, 
b3,... is a sequence of non-negative numbers b; > 0 then 
either 


II (1 + b;), I (1 — b;), 2, b: 


all converge or none converge. Roughly speaking, none 
can converge unless b; — 0; if b; — 0 then products of 
two or more b’s can be neglected relative to the b’s 
themselves and 


(d+ bm+1)(1 + Dm2) -e (1 + ba) 
= | + (Om41 + bn+2 +e bn) + small 


hence II (1 + b;)is near 1 if and only if at b; is 


i=m+1 i=m+1 
near zero, which is the statement of the theorem. For a 
rigorous proof, see Exercises 12-14. 
This theorem implies that the harmonic series 1 + 4 + 
4+ 4t+4+. is divergent t because the product 
(+ D0-+ DOF OCF De HR EHH is 


Chapter9 | Further Study of Limits 


Exercises 


404 


obviously divergent. Euler used a similarly simple 
argument (Exercise 16) to prove that the series 


5 3 F 7 T p eciproca 


prime numbers is also divergent; this was a substantial 
strengthening of Euclid’s theorem that there are infinitely 
many primes. 

The improper integral |2. [%., e7 ®t? dx dy can be 
evaluated by converting to polar coordinates x = r cos 0, 
y = r sin @to find that the integral over the disk of radius 


Ris 
2r R 2r R 
| | er dr dd | | d[—4e—" do] 
0 0 0 0 


2r 
= | 41 — e~") do 
0 


Ry 


= ml -e 


This shows that the integral over any rectangle containing 
the square {|x| < K, |y| < K} is at least 7[l1 — e~**] 
and at most 7; hence the integral converges to r. On the 
other hand, the integral of e~°’*%” dxdy over the 
square {|x| < K, |y| < K} is easily seen, by comparing 
approximating sums, to be (fk e~’ dt)”. This proves 
that the improper integral fe e—” dt converges and has 
the value 


| e dt = A/T. 


1 Prove that the series 1 + x + x?/2! + x3/3! +-+.» is 
convergent for all values of x. 


2 Prove that the series 1 + x + x? + x3 +++- is con- 
vergent if and only if |x| < 1. Show that the sum is (1 — x)7! 
(when |x| < 1). 


3 Prove that the product (1 + x)(1 + x?)(1 + x*) X 
(1 + x8)... is convergent if and only if |x| < 1. Show that 
the product is (1 — x)~! (when |x| < 1). 


4 Find r?/6 to five decimal places. Estimate the number of 
terms of the series 1 + 4 + 4 + 4 +-+: which would 
have to be added in order to obtain this accuracy. [Compare 
1 
- J 


1 
to the sum —— + ———__—___ + -. 
othe sum m+ 1) (m+ im+ 2" 


9.5 | Other Types of Limits 


1+ 


Nol | 


+: 


+-+ 


k+1 


405 


TEE? 


5 Deduce Wallis’ formula 


Il 
ma | KO 
WIN 
wl 
TEN 
IO 
aio 


NIN 


from the product formula (1). 


6 Theseries]1-1+$5-$4+4-$4+4-4+4+3-"°: 
converges to zero. Show that if it is rearranged so that k terms 
with the sign + are taken for each term with the sign — ; that 
is, if the series is rearranged 


1 1 1 
pat or) a i) ee 


1 1 1 


3k 33k] 


pe 


to prove that the sum of the first N terms is within € of log k 
for all sufficiently large N.] In particular conclude that 


l-4+3- 3+5- = log2. 


7 Describe a rearrangement of the series of Exercise 6 which 
converges to the sum 10. [Do not attempt to find a formula 
for the rearrangement, but describe the order in which 
positive and negative terms are to be taken.] 


8 A series bı + b2 + b3 + +: is said to be a rearrange- 
ment of a series a1 + a2 + a3 + -if each a; occurs exactly 
once among the b; and vice versa (making the obvious pro- 
visions regarding numbers which may occur more than once 
among the a’s). The preceding examples show that rearrange- 
ment of a convergent series may alter its sum. Show that if 
ay +a2q+a3+::: is absolutely convergent, that is, if the 


series |ai| + [a| + |a3| + +++ is convergent, then ai + 
a2 + a3 +--+: is convergent, any rearrangement of a; + 
ag +az3-+--- is also convergent, and the sum of any 


rearrangement is equal to the sum of the original series. 


9 Show that if a; + a2 + a3 +°°: is a convergent series 
which is not absolutely convergent, then there is a rearrange- 
ment of the series which converges to 10 or, for that matter, 
to any sum whatsoever. 


Chapter9 | Further Study of Limits 


406 


10 Prove that if an infinite product converges in the sense 


defined in the text then there is a number Il a; as stated in 


i=1 

the text. Show also that the number Il a; is zero if and only 
i=l 

if at least one of the factors a; is zero. [First choose N such 
that |am41am+2 . . -an — 1| < 4 whenever n > m > N. Let 
P = aiao... ayn. If P = 0 then an a; is zero and the products 
I] a: are all zero for n > N. Otherwise | Į] ai > 3/P| 
i=l ix1 


for all n > N and the limit, if it exists, is not zero. Finally, 


n 
ipe 
i=l 

n m 
II ay — I] a; 
i= 


i=l 


< 3|P| for all n > N from which it follows that 


can be made small by making n, m large.] 


11 Prove that an infinite product aia2a3 ... converges if 
and only if there is an N such that log a; is defined for i > N 


o0 
and > log a; is a convergent series. 
i=N 


12 Prove that if b; > 0 then Il (1 + 5,;) converges if and 
i=l 
only if `b; converges. [Use 61 + bo +° + br < 


(1 + bi) + b2)... (1 + bn) < et HBF Fon | 


13 Prove that if aja2a3... is a convergent product and if 
none of the a; are zero, then a; az a3! . .. converges to the 
reciprocal of aja2a3... . 

14 Prove that if b; > 0 then [[(1 + b;) converges if and 
only if [JQ — b) converges. [Show that 1+6< 
(1 — 5)~! < 1 + 2b for small positive b.] 

15 By analogy with (1) write cos x as an infinite product. 
Use this to guess the sum of the series 


1 1 1 
Verify the result using 
1 1 1 r 
I+atotyet’ = 5 


1 1 íi 1 1 1 
16 If > -==+-4+-4+-4+-4+°°: runs over 
2 ztztztjtgt © 
the prime numbers) were a convergent series then the product 


—1 
II (1 — *) would converge, i.e. 
P 


1 1 1 
1t+-+—4—4.--- 
I1( Totpt pt ) 


9.6 | Interchange of Limits 


Interchange of Limits 


0+0+0+0+... 


407 


- OOO 


would converge. But, when expanded out, this product is 
1+$+4+%4-+-:: (because every natural number can 
be written in just one way as a product of prime powers) 


, 1 , “a: , 
which diverges. Hence >> — must diverge. Fill in the steps in 
P 


1 
this argument to prove that >> 7 diverges. 


17 Evaluate f2, e-"® dt. 


18 The integral ff x~* dx is improper if a > 0. Why? For 
which values of a does it converge? 


19 The integral {5° e-‘* dt is one half of the integral evalu- 
ated in the text, hence its value is /7. This value can also be 
obtained from 


o0 2 [06] o0 
( J e” a) — J | et) dx dy 
0 0 0 


by making the change of variable y = mx. Sketch the ‘new 
coordinates’ (x, m), and evaluate the integral. As will be seen 
in the next section, the operation of changing the variable in 
an improper integral actually requires justification. However, 
in this example the integrand is positive, the improper 
integral converges absolutely, and the justification is im- 
mediate. 


9.6 


Some of the most interesting and subtle facts in advanced 
calculus are related to the interchanging of two limit 
processes. This section contains several examples of such 
interchanges. 

It is important first to understand that limits cannot 
always be interchanged, as is illustrated by the following 
example: For each pair m, n of natural numbers let a,» 
be lifm = n, —lifm = n + 1, and 0 in all other cases. 


Then for each n the series >) amn hasa 1, a —1, and 
m=1 


the rest 0’s, hence is convergent and has the sum zero. 


o0 


o0 
Thus >> (È ann) exists and is zero because it is a 


n=1 \m=1 


series in which all terms are zero. On the other hand, the 


a0 
series >) am n converges to 0 unless m = 1, in which 


n=l 0 


case it converges to 1. Thus ` (È ann) = | # 
n=1 


m=i 


Chapter9 | Further Study of Limits 408 


> ( » ann) and the two limits $} and >) cannot 


n=1 \m=I1 n=l m=i 


be interchanged. 


ioe) 


An expression of the form >> (> ann) is called a 
m=i 


n=1 \m=1 
converge to the sum S if each of the series > Amn 


m=i 
(n = 1,2,3,...) is convergent, say to b,, and if the 
series bı + ba + b3 +--+ converges to the sum S. It 
is easy to construct examples of convergent double series 


> z Am a) for which the double series 5 
=1 m=1 


double series. A double series >> ( » an. is said to 


n 
1+5+53+3+...=div. 2 
0o-}4+0+0+..=-] È an obtained by summing in the reverse order is 
1 __! 
O+0-4+0+...= -3 not convergent, and, as the above example shows, even 
0+0+0-{+...=-} 


if the reversed double series is convergent its sum need 
not be the same as that of the original series. Neither of 
1 1 1 1 1 1 . . 
T- 9+ 3-4+5-6+7----=!092 — these circumstances can occur, however, if all of the terms 
[0 e] o0 
Amn are positive, that is: If >> (È ann) is a con- 


n=1 m=1 
vergent double series in which all terms am n are positive, 


then the double series >) (È inn) is also convergent 


m=] \n=1 


and the sums are equal, >) (È ann) = >») 
m=1 


Briefly, the proof of this theorem is as follows: To 


prove that each of the series > Amn is convergent it 
n=l 

suffices to show that for every e > 0 there is an N such 

that any sum of a finite number of terms m,n in which 

one or both indices are greater than N (m > N or 


n > N) is less than e. To this end, the convergence 


of >> ( > ann) can be used to find an No such that 


m=i 


n2 00 
» (> ann) < €/2 whenever nı, no > No. Then for 


n=n,; \m=1 


each n = 1, 2, 3,..., No the convergence of > An,m 
m=1 


can be used to find an N, (n = 1, 2,..., No) such that 
m2 
2 Amn < (€/2No) whenever m,, mz > Npn. Let N be 


the largest of the integers No, Nn (n = 1,2,..., No). 
Then any finite sum of terms am,n in which n > N or 


9.6 | Interchange of Limits 409 


. .. a po . 
teses sasas a SS SS reeeo ae 
.. sse 
N L i 
° 
cae . . . 


Any sum of a finite 
number of terms outside 
the box is less than e. 


0 


m> N is at most Omit do amot + 
N m=N 


mal 


i? 0) o0 o0 o0 
>, damn t Do ( amn) < e. Thus >> amn con- 
m=N i 


n=No \m= n=l 
verges form = 1, 2,3,... . Moreover, when N is chosen 
in this way any finite sum of terms a,,,, which includes 
all terms am n for m < N, n < N, differs by at most e 


from >, (> ann) From this observation it is easy 


n=] m=i 


to conclude that >> (È ann) converges to 


n=] 
» (È ann) as desired. 


n=] \m=1 
o0 


More generally, a double series >, ( 
n=l] \m 


ann) is said 
1 


to converge absolutely if the double series >» ( nl) 
n=1 \m=I1 

of absolute values converges. When this is the case, 

a slight modification of. the above proof shows that 

the double series itself converges and can be summed in 


either order >> (È ann) = > (È ann). More- 


n=] \m=1 m=] \n=1 


over, an absolutely convergent double series can be 


o0 
summed in any other order, e.g. >. ( 5 ann): and 
N=1 \m+n=N 


the result is the same. Similar results apply to rearrange- 
ments of ordinary series. (See Exercises 7, 8, 9 of §9.5.) 

Similar considerations apply to improper double 
integrals. For example, if A(x, y) is a function defined 
on the entire xy-plane then 


| (| A(x, y) dx) dy, | (| A(x, y) dy) dx, 
| | A(x, y) dx dy 


are all defined in different ways and there is no guarantee 
a priori that if one converges then the others do or that 
if two converge then their values are equal. However, 
the technique applied to double series above is easily 
generalized to prove: If A is continuousand if [2s [2e 
A(x, y) dx dy is absolutely convergent—t.e. if the improper 
integral J”. f2. |A(x, y)| dx dy converges in the sense of 
§9.5—then the improper integral {”., [2s A(x, y) dx dy 
can be ‘summed in any order’ with the same result. For 
example, the interated integrals above converge to the 


Chapter9 | Further Study of Limits 410 


same value as the double integral, as does 


lim | A(x, y) dx dy. 

r=% J (2*4y2<r?} 

Thus, if an improper double integral is absolutely 

convergent then the restriction to rectangles in the 

definition of fo” fa" A(x, y) dx dy is inessential and any 

other sort of domain could be used as well. On the other 

hand, if a double integral is convergent but not ab- 

*An integral (or series) which is solutely convergent*, then, as in the case of series, the 

convergent but not absolutely order of summation is crucial and one must proceed with 

convergent Is said to be conditionally 

extreme care. 


convergent. 
The formula 


ite l1+zr 
joa +x) = f Lay = | 1+d-)+(0-0°+0-9°4+--Jd 


1+z l+zr 1+2z 
-j a+ | a-oa) = Ad- 
1 1 1 


2 3 4 5 
x x x x 
a ne rer i ne 
1S š obtained by interchanging the limit Ji and the limit 
=1l+(i—-)+d0—-—9°%4-:-. Such an inter- 
change must always be justified, but in this case the 
justification is quite easy: The formula 
| 1 — DT 
Loia- 9+0- D E 
can be integrated from 1 to 1 + x (the integral of a finite 
sum is the sum of the integrals) to give 
2 3 l — pet: 
log(1+x)=x-5+5- < + yta f =D at 


For any xin the range |x| < 1 the integrand (1 — A)” T!/t 
is at most |x|"~!/(1 — |x|) on the domain of integration, 
and the integral is therefore at most |x| times this. If any 
e > Ois given, then this integral can be made less than e€ 
by making n large. Hence the desired formula 


2 3 4 
(1) logl+x)=x-F+G-Pto° 


holds for |x| < 1. 
This formula (1) cannot hold for |x| > 1 or for 
x = —I because in these cases the series on the right does 


9.6 | Interchange of Limits 411 


*See Exercise 6, §9.5, for an 
alternative proof of this formula. 


t/n fact, comparison with the 
geometric series shows easily that it 
is convergent for |x| < 1. Only the 
values of x near 7 are of interest in 
the theorem. 


tThis formula is analogous to 
integration by parts. Specifically, let 
A denote the sequence A; = âm + 
am+1 - -+ aj and let B denote the 
sequence xi. Then the sum on the 
left is XAA -B where AA represents 
the sequence of differences 

Aj — Aj-1 = aj. The formula is then 
the statement Z AA -B = —} A -AB + 
(A - B)|m analogous to integration by 
parts. 


AmX” + amp1x”T? Hee + dnx” 
— nl (x _ xmtly + (xm 


+ m+ eae 


— xm) + (xT? 


not converge. However, the alternating series 1 — 4 + 
¿4-—ł - obtained by setting x = 1 does converge 
and one would naturally expect that its sum is 
log(1 + 1) = log 2, even though the above proof is 
no longer valid. A slight refinement of this proof suffices 
to verify* that log2 = 1 — 4 + 4 — 4 + ++; but this 
conclusion can also be reached by appealing to an im- 
portant theorem on the interchange of limits, namely 
Abel’s Theorem: If a, + aa + a3 + +: is a convergent 
series then the power series a,x + aax? + agx? + -is 
convergent for all numbers x in the range¥ O < x < land 


lim [aix + aox? + ++] = ay + ag + as Fo 


That is, given e > 0 there is a 6 > 0 such that for any x 
in the range 1 — 6 < x < 1 the sum of the power series 
differs by less than e from the sum of the original series. 

The proof of Abel’s Theorem is based on the formulat 


— x”+2) + oae’ + (x"—! _ x”) + x”] 
— x” +3) pe + (x"—! — x”) + x") 


tees + ay a(x"! — x") + x") + anx” 


— (x™ — XT Dam + (xm) 


+ (x”t3 — 


— x”t? (am + am41) 


xT?) am + Am4+1 + Om+2) 


pee (x"—! — x"\(Qm + AQmti titi + An—1) 
+ x" + Amoi + +++ + Gn). 


amx” + Amp ix"? t+ + anx”) 
< |x” — x”tije + |x”t1 
— (x™ — xmrl + xr 


Given e > 0, let N be such that |am + amı + °° + 
an| < € whenever m, n > N. Taking absolute values in 
the above formula and using the triangle inequality gives 


— xT? pees 4 xT — xe + |x"Ile 
— xmre to... p xT _ x" p ye = xME <E 


for x in the range 0 < x < 1. This proves not only that 
the power series is convergent but also that its sum 
differs by at most e from the sum of the first N terms 
aix + aox? +--+ + ayx™. But a, + a2 + °° + an 


differs by at most e from >) a, and 


n=l 


(ay + dg + +++ + ay) — (aix + agx? + +++ + ayx¥)| 
< la, — x)| + |aa(l — x) + x)| fever lav(1 —~xi+x-+t xe pee xN-)| 
< |1 — x|({ay| + 2ļlaa| + +++ + Niay)) 


Chapter9 | Further Study of Limits 


412 


which (because N is fixed) is less than e for x sufficiently 


near 1. Thus > a,x” differs by at most 3e from 5 An 

n=l n=l 
for all x < 1 sufficiently near 1. Since e was arbitrary 
this completes the proof of Abel’s theorem. 

An alternative statement of Abel’s theorem which is 
important is the following: A series a} + dg + a3+ °°: 
is said to be Abel summable if the power series a,x + 
aox? + asx? +--+: converges for |x| <1 and if 
lim (a,x + ax? + +) exists. When this is the case the 
zy 


Abel sum of the series is this limit. Then Abel’s theorem 
states that a convergent series is Abel summable and that 
its Abel sum is its ordinary sum. The converse of this 
theorem is not true, that is, a series can be Abel summable 
without being convergent. For example, the series 
l1—1+1-—-—1+1-—1+ °°: is not convergent, but 


— y2 3 yt over = x 
x= x +x xot [4x 


Q 


is convergent for |x| < 1 and 


lim ~~ =i. 
afl l + X 2 
hence 1 — 1+ 1— 1+.: is Abel summable with 
Abel sum 4. Note that any series ay + ag + ': > which 
is Abel summable but not convergent gives an example 
of limits which cannot be interchanged, namely, 


N N 
lim lim ( anx") ~ lim lim (È anx”) 
zat1 Now \n=1 Now «11 \n=1 


because the limit on the right does not exist. 
The formula of §9.2 


0 in rifx>0 
(2) | ss Vdy=\ Oifx=0 
—» Y —rifx<0 


gives another example of a double limit which cannot be 
interchanged, namely 


0 sin (7 0 
J im 7 ay - | Ody = 0. 


9.6 | Interchange of Limits 


413 


The formula (2) itself can be seen as the result of an inter- 
change of limits (this time a valid one) as follows: If 
z > 0 then 


00 K 
| edt = lim | e`” dt 
0 K-30 JO 


[| 
3 
| rs | 
| 
3 
| 
A 
pi 
Ii 
| =a 


Assuming this formula is still valid for complex numbers 
z = a + biin which a > 0 gives 


o0 l 1 
—(at+bzr)t _— __ 
i J e dt a4 bi 
| e~“Icos bt — isin bt] dt 
0 


a — bi = an bi 


~ (a+ bila — bi) a2 +b? 


and equating imaginary parts 


C at b 

—at — “ss, 

J e sin bt dt a2 4 b2 

Multiplying both sides by da and integrating from a = 0 
toa = œ gives 


P a o f bd 
JJe sin be dtda = | a2 1 b2 


The integral on the right is zero if b = 0, 7/2 if b > 0 
° du , 
(take u = a/b and use f o: = r), —1/2 if 
b < 0 (take u = —a/b). Interchanging the order of 
integration on the left gives 
la 
a=0 


o0 o0 o0 —at 
| | | e™ sin bt da dt = | É sin bt 
0 0 0 —ft 
_ | sin bt di 
0 t 


and (2) follows. The steps in this argument can be 
justified to give a proof of (2) as follows: The essence of 
the argument is that the improper integral fe fè e7% 
sin bt dt da (6 fixed) can be evaluated by using either the 
formula 


—at 
d Í 7 sin bt | = e “sin bt dt da 


Chapter9 | Further Study of Limits 


414 


or the formula 


e~“(asin bt + bcos bt) 
a- ee a 


| = e sin bt dt da 


and applying Stokes’ theorem. In fact, if the improper 
integral were absolutely convergent, i.e. if fe oe 7 
|sin bt| dt da converged, then the analog of the argument 


proving » (È ann) = > (X ann) for abso- 
m=1 


m=1 \n=1 n=l 

lutely convergent double series would show that these 
two methods of computing f o e7% sin bt dt da give 
the same result and hence prove (2). However, this 
improper integral is not absolutely convergent and a 
somewhat more careful analysis is necessary. Setting 
K = 2rn/b for n a large positive integer, the formulas 
above give 


K „K 
| | e7” sin bt dt da 
o Jo 


K sin bt © eF" gin by 
-f sin bt ay 4 | E SIn OF dt 
0 t K t 


and 


K „K 
| | e™ sin bt dt da 
o Jo 


K ka 0 
--| < aa ~ | -b da 
9 az + b2? ga? +b °° 
Equating these two expressions for {4° fE e7% sin bt dt da 
and using the fact that the integrals 


” sin bt É b 
J z dt, J a2 4 + b2 da 


converge, their equality follows by taking the limit as 
n — oo and showing that 


K 
. _g: Sin bt 
im | eE dt = 0 
Kw JO t 


K 
, xa b „L 
pm i E a2 + b? da = 0. 


To prove that the first of these integrals approaches 
zero as K — œ let ô be a positive number and let the 


9.6 | Interchange of Limits 


415 


domain of integration {0 < t < K} be broken into the 
intervals {0 < t < ô}, {8 < t < K}. This gives 


K ah ô K wah 
| eX! sin Ot dt < | e*'b dt + | eka SIN Dl tH 
0 t 0 5 Í 


(using the fact that sin bt/b assumes its maximum value 
batt = 0) 


1—e*? xs f sin bt 
<—p Oo +e ; —— at 


which approaches zero as K— œ. The proof that 
K b 
J. et a pe approaches zero as K— œ is 


virtually identical and completes the proof of (2). 
The product formula for the sine 


00 x? 
(3) sinx = x JT (1-35) 
n=l 


can be derived from the trigonometric identity 


E(t) 
(3) sinx = psin (5) H l — m 
P 


valid for all x and for all odd positive integers p. To 
derive the identity (3’) set p = 2q + 1, where q is a 
positive integer, and expand De Moivre’s formula 


cos pA + isin pA = [cos A + isin A]? 


by the binomial theorem to obtain 
p 
cos pA + isin pA = >) C) i” cos? A sin” A. 


In taking the imaginary parts on both sides only the 
terms v = odd appear on the right; setting y = sin A, 
cos? A = 1 — y?, this gives an identity of the form 


sin pA = p(l — y*)ty + + (-DY? 
= py + terms in y’, y5, ..., y”. 


The right-hand side is a polynomial of degree p in y 
which must be zero whenever y = sin Á for A = Q, 


2 , . 
+Z , + Z , + u Since these p numbers sin 4 are 
p 


Chapter9 | Further Study of Limits 


u = sin ab 


416 


distinct, they account for all the roots of the polynomial 
and the polynomial must be 


f y y 
PAC m(t m) 
n = S — — 
in ( p ) sin ( p ) 
Setting x = pA, y = sin Á = sin(x/p) then gives (3’). 
Now since 


lim p sin (5) 


p—0 
sin (=) — sin 0 
, p 


x = x-cos0=x 
Pp 


and 


the formula (3) follows formally from (3’) by letting 
p — œ. This passage to the limit involves an interchange 
of limits (the product of the limits is equal to the limit of 
the products) which must be justified: 
Let x be fixed. For each n let a = un ,b = x so that 
p AT 
the term of (3’) corresponding to n is 1 if a > 5 and 


sin ab/sina if a < 5 . Then the inequalities 


Isin ab| < alb| (a > 0) 
2 a< sina (0<a<3) 
T 2 


show that the nth factor of the product (3’) lies between 1 


— | b| \? 2 
(if n > 2) and 1 - z = 1 — 75 for all p. 
-a 


T 
00 2 
Now the product [| (1 — 23) converges by the 
n=] n 


theorem of §9.5, which means that for any e there is an 

N such that any product of a finite number of factors of 
0 2 

Il (1 — ~,) is within e of 1. It follows that the 

n=N 

product of all factors past the Nth in (3’) (of which all 

but a finite number are 1) differs by less than e from 1. 


9.6 | Interchange of Limits 


417 


The first zero occurs 
when xt = 5, that 
is, when t = a 


Hence for all (odd) p and for all sufficiently large N 
>. 2 x 
sin (5) 
>. 2 AT 
sin (=) 
where |6,| < e. On the other hand, for all sufficiently 

large N 


00 x? N x? 
xH (: - 45) =x It (i - ža) 0 + 52) 


where |5.| < e (by the definition of convergence of a 
product). Finally, for any fixed N 


N 
sin x = psin (=) IJ }1- 
n=l 


(in a finite product the limit of the product is the product 
of the limits) so that when p is sufficiently large the first N 
N 


2 
factors of (3’) can be obtained from x I] (1 _ ) 


2—2 
n=l n T 
by adding a number 63 with |53| < e. All together this 


gives 


x? 
( — s) + és] (L + 61) 


x? 
ih) 
-| TT; TL + ôs | (1 + 44). 


Since 51, 69, 63 are arbitrarily small, (3) follows. 
As a final example, consider the evaluation of the 
improper integral 


/ 


(4) f(x) = | e`” cos xt dt. 


(This problem is of central importance in the theory of 
diffusion and heat conduction.) For x = O this is the 
integral 


(5) | ee dt = Vr 


which was evaluated in §9.5. Because the factor cos xt of 
(4) introduces cancellations into the integral (5), it can be 
seen that the given integral (4) converges for all x and 


Chapter 9 | Further Study of Limits 


f(x) = 


V/T — 
= val 1 


= Vx eT”, 


418 


that its value f(x) is largest when x = 0, namely f(0) = 
Var > f(x) for x = 0. Clearly f(—x) = f(x). Plausible 
but less obvious are the facts that lim f(x) = 0O and 


F(x) > 0 (all x). Now f(x) is in fact an elementary 
function (in the technical sense of §9.2) namely the 
function 


(6) fx) = Ve, 


This formula can be derived as follows: 

Expanding cos xt as a power series cos xt = | — 
(xt)?/2 + (xt)*/4! — --+ and interchanging the integral 
and the sum of the series gives 


j 2 x? o 2 x4 ° 2 
f(x) = edt — > te~ dt + a tre" dt — 
—o . — 00 


SVE ty 


x4 


55 


4-3. 
xê 
3126 


2 J 


assuming the interchange is valid. This will give a power 
series expansion of f(x) if the integrals feat 2no—t* dt can 
be evaluated. Making the change of variable t = va: u 
in (5) gives 


o0 — 


(5’) J e7% du = JE- 


Differentiating both sides of this equation with respect to 
a and interchanging the differentiation and integration 
gives 


J. (=u?) du = (— pa” V7 


assuming the interchange is valid. Differentiating re- 
peatedly gives 


(=u) e du = (— 4)(— pa VT 


(=u e =e du = (— 3)(— 8(— a7 Vr 


etc. Setting a = 1, cancelling the minus signs, and using 
the result in the power series for f(x) gives, as desired, 


3 1- xê 5-3-1 
ROT DTV aT IT Vt 


fete sete] 


ni22n 


9.6 | Interchange of Limits 419 


Exercises 


In order to make this argument into a proof, one must 
justify the two operations of ‘passing to the limit under 
an integral sign’ which it involves, that 1s, one must prove 
that the integral of the infinite series is the sum of the 
integrals and that the integral of the derivative is the 
derivative of the integral. Since the main theorem on 
passing to the limit under an integral sign is given in the 
next section, these justifications are postponed until then. 

There is no general theorem governing interchanges of 
limits, but rather many theorems, such as Abel’s Theorem, 
covering various types of interchanges. In many cases it 
is easiest to justify an interchange directly, rather than by 


appeal to a general theorem; the comparison of the 
2 


product (3’) to the product |] (: — 2) is a standard 
n2 


technique for such proofs (see Exercise 5). 


1 Write(1 + t?)-!t as an infinite series, integrate, and apply 
Abel’s Theorem to obtain Leibniz’s formula 


justifying all steps fully. 

2 Evaluate f” eu" cos (yu) du (y,a given, with a > 0) 
by performing a change of variables in the formula 
fon e`? cos xt dt = \/r e~*"/4, 


3 Evaluate f(x) = | Sin cos xy dy. (Using a trigono- 
y 


metric identity for sin y cos xy this can be reduced to two 
applications of formula (2) of the text.] Draw the graph of 
f(x). 


4 Anexample of a non-differentiable function. Let 


. , , 1. 
f(x) = sinx + $sin 2x + 4 sin 4x +--+: +—sindx+---- 


2n 


Show that f(x) is defined for all x and is uniformly continuous. 
[As usual, break the step from f(x) to f(x’) into three steps— 
from f(x) to the sum of the first N terms evaluated at x, then 
to the sum of the first N terms at x’, then to f(x’). Show that 
by making N large and then making |x — x’| small, all three 


1. 
steps can be made small.] Sketch the graph of n sin 2”x for 


n = 1, 2, 3, 4, and of their derivatives. These diagrams make 


Chapter9 | Further Study of Limits 


*See B. Sz.-Nagy, Introduction to 
Real Functions and Orthogonal 
Expansions, Oxford University Press, 
New York, 1965, pp. 107-103, for a 
different example with a complete 
proof of nondifferentiability. 


tOf course by Theorem 2, §9.4, 
every continuous function on 
{a < x < b} is uniformly continuous. 


420 


it plausible that f(x) is not differentiable at any point. It can 
in fact be shown that lim [f(x + s) — f(x)]/s does not exist 


s—0 
at any point, and the proof is not difficult, although somewhat 
long.* 


5 Weierstrass ‘M-test’. Generalizing the example of 4, show 
that if 


F(x) = u(x) + ue(x) + ug) +> 


is an infinite series of functions of x, say on an interval 
{a < x < b}, such that (i) each of the functions u,(x) is 
uniformly continuoust on {a < x < b} and such that (ii) 


o0 
there is a convergent series of positive constants 5 Mn such 


n=l 
that [un(x)| < M, for all x in {a < x < b} and for all 


20 


n = 1, 2, 3,..., then f(x) = >> u,(x) is defined and uni- 


n=l 


formly continuous on {a < x < b}. 


6 Let f(x) = ao + aix + aox? + agx? +°+-+ be a func- 
tion defined by a power series (for all values of x where this 
series converges). Show that if X is a point such that f(x) is 
defined, then f(x) is defined and uniformly continuous on any 
interval {—|x| + e < x < |x| — e} for «e > 0. [This is an 
application of the ‘M-test’. If }°a,x" converges then the 
absolute values |a,x”| must be bounded, from which $` la;lp' 
can be shown to converge for 0 < p < |F|. Set Mn = |an|p” 
for p near |x|.] 


7 Show that if the power series for 1/(1 + x) is multiplied 
formally by the power series for 1/(1 — x) the result is the 
power series for 1/(1 — x). 


8 Multiplication of power series. Let f(x) = ago + aix + 
aox? ++: and g(x) = bo + bix + box? +--+ be two 
functions defined by power series. Show that if ¥ is a point 
such that both f(x) and g(%) are defined then the power series 


(*) aobo + (aobi + aibo)x + (aobe + aibi + aobo)x° +p- 


+ (anbo + Qn—1b1 +: + aobn)x” +e 
converges to f(x)g(x) whenever |x| < |x|. [Show that the 


o0 o0 
double series > 5 Ambnx"x™ is absolutely convergent for 
m=0 n=0 


|x| < |x|, hence can be summed in any order and, in particular, 
in the order (*).] 


9 Operating formally—i.e. without justifying all steps—write 
2 
x 0. 
log { 1 — =) as a power series in x, then sum over n to 
n° 


obtain the power series for log(sin x/x). A power series 


9.6 | Interchange of Limits 


421 


log(sin x/x) = a2x? + aax* + agx® + +++ can also be 
found by the following method: The power series 


2aox + 4a4x® + 6agx® + °°: 


represents 


d , 
ix log(sin x/x) 


and should therefore satisfy 


(4 log(sin s/o) xsinx = xcosx — sin x. 


Using the power series for sin x, cos x, multiplying, and 
equating coefficients gives equations which can be solved for 
a2, a4, a6, . . . successively. Find a2, a4, ag. The two formulas 
for a2 give 


pdl. T 
22 32 a: 
and the two formulas for a4 give 
4 
1 1 T 
1+ z4 + 34 + = 99 


Find 
1 1 J 
Pog tae t get 


10 The factorial function. Formally, the product formula for 
sin x can be written 


sin x = x: IL (1+ ~) TT (1-2). 


= n=l 


but this is in fact meaningless because these two infinite 
products diverge (unless x = 0). However, the limit 


lim il (1 + x) wal 

N—> 00 n = 
does exist for all x, and defines a function F(x) which satisfies 
sin x = x: F(x): F(—x): 


(a) Rather than the limit above, it is simpler to set 
y = x/r and to consider 


im TI (: + 2). -v 
N-0 n=l 

By direct evaluation, show that for y = 1, 2, 3,... 

this limit exists and is equal to 1/y!. Thus if one 

defines a function [[(y) by the equation 


e ios = jm H (142) 97 


Chapter9 | Further Study of Limits 


422 


whenever this limit exists and is not zero, the function 
[[Q) is an extension of the factorial function. 
(b) What value for 0! is indicated by the formula 


(") B n! 5 
k) kWn—k!- 


Does this agree with (*)? 

(c) Show that the limit (*) exists for all values of y, and 
that it defines [](y) for all values of y except 
y = —1, —2, —3,... . [Rewriting the limit (*) as 


erally) iede 


the problem is to show that the product a,ja2a3... 
with 


converges. Now 


1+2 1 — Ë 4+ terms int, +,... 
n n n2 n 


l . 1 1 _ 
+ terms in 75> 73? 


dn 


hence for large values of n the factors an are like 


(: + cor), which indicates that their product 
n 


converges, as desired. To make this a rigorous proof, 
prove that there is a constant K such that 


log (1 + x) — x| < Kx? 
for all sufficiently small x, hence that |log an| < 


1 00 
K(y? + |y)) 72> hence that >. loga, converges, 
n=N 


hence that [[ a, converges. The product is zero 


n=l 
only if a factor is zero.] 
(d) Prove the formulas 


sin ry = 


TY 
Im H(-y) 


IIo) = xo - 1). 
(e) Combining the formulas of (d), prove that [](—3) = 


and 


9.6 | Interchange of Limits 


423 


v/T and derive from this the values of JIG, IT@), 
e. Il@ 7 3), and Il(-3), Il(—3), ee) 
II-a — 5). 

(f) Plot the values of 1/[](y) for y = —5, —43, 
—4,...,3%, 4, 45, 5. Make some guesses as to the 
other values and sketch the graph of 1/T[(y). 

(g) Prove that 


TUA — 4) 
I[(2x) 


is a constant independent of x. [It can be written as 
the limit of a quantity independent of x.] Evaluate 
the constant by setting x = 0. 

(h) Prove that for any positive integer n 


n n 
[[(@x) 


is a constant independent of x. 

(i) Let un denote the constant of (h) form = 1,2,3,.... 
Use the formulas of (d) to write u2 in terms of r and 
n — 1 values of sin x. Find un for n = 2, 3, 4, 6 and 
guess the general formula. 

(j) Verify the guess of (i) for odd values of n using the 
formula for sin nA as a product (derived in the proof 
of formula (3)’ of the text). Using an analogous 
trigonometric identity, prove the formula for even 
values of n. The result is the multiplication formula 


Moo L Va (On "Tey (x-2) + TI( -*=1), 


n 


(k) Using a formula from the text and a change of 
variable, prove that 


1\ | i n—1/2 —t 
Il ( — 1) = J t e dt 
(for n = 0,1,2,...). 


(1) Show that the improper integral fo t"e~' dt converges 
for n a positive integer or zero, and use integration 
by parts to show that its value is n! = [](n). 

(m) For what values of x does the improper integral 
fo e~t dt converge? Since its value is [](x) for 
x = —4,0, 4,1,13, .. . it is reasonable to guess that 
its value is ][(x) whenever it converges. This is 
proved in the following exercise. 

(n) Notation. The gamma function T(x) is defined by 
T(x) = []@ — 1). There is no apparent justification 
for this awkward notation other than tradition. Any 


Chapter9 | Further Study of Limits 


424 


equation involving the [-function can be im- 
mediately translated into an equation involving the 
[[-function, and the result will usually be simpler. 
Translate the multiplication formula for [] (see 
(j)) into the multiplication formula of the I’-function. 


11 Binomial coefficients. The binomial coefficient (*) is 
n 


defined for all real numbers x and for all integers n > 0 to be 


(*) - MEW o aD 


n n! 


_ Ue 
II — n) 


[These numbers occur in the power series expansion of 


(1+), namely (1 + £4) = 1 + xt + “os” t? + 


ee pers DL (*) t”. See §8.4.] 


(a) Show that 1/]](x) = lim (* + ") n~. Thus [[(x) 


n 
can be expressed in terms of a limit of reciprocals of 
binomial coefficients. The integral representation 
I(x) = fo te~ dt can be obtained from an integral 
representation (Euler’s first integral) of the reciprocal 
binomial coefficients, namely the integral 


1 
C(x, y) = x +y + D rd — 1)" dt. 


(b) For what values of x, y does the integral C(x, y) 
converge ? 
(c) Show that C(x, y) = C(y, x) and that C(x, 0) = 1 
for all x. 
(d) Integrating by parts [use the derivative of 
getut+l(z—1 — 1)%] show that 
y 


C(x, y) = x+y C(x, y — 1). 


(e) Conclude from the above that 
1 


x+n 
n 
hence [[(x) = lim n*C(x, n). 
n—0 
(Ð Using the change of variable u = nt in the integral 


C(x, n), show that for very large integers n the 
integral n*C(x, n) is nearly fo ute~" du. 


C(x, n) = 


9.6 | Interchange of Limits 


425 


(g) Formulate the equation [[(x) = fo ute—¥ du as an 


interchange of limits. This interchange will be 
justified in §9.7. 


(h) In the integral 


Il WIL = J | se ‘t'e ‘dsdt 


perform the change of variables u = s + t, v = 
s/(s + t) (hence s = uv, t = u(1 — v)) to obtain 


_ OY) | 
HT@Mo = pple» 
= C(x,y) [J] œ +»). 


Thus C(x, y) is the reciprocal binomial coefficient 


— T@IT06) 
C(x, y) = G+ 


for all values of x, y for which it converges. [No 
elaborate discussion of the change of variables in the 
improper integral [[(x)]](y) is necessary since the 
integrand is positive and therefore absolutely con- 
vergent, hence can be ‘summed’ in any order.] 


(i) Notation. The beta function is the function defined 


by the integral 


1 
B(x, y) = J "a — A dt. 


Express B(x, y) in terms of the [[-function and in 
terms of the IT-function. 


12 The ‘volume’ of the n-dimensional ball 


B= {xitxte er 


nl2 
T n 


IOA r = i dx2 e... dXn. 


(a) Verify this formula in the cases n = 1, 2, 3. 
(b) Prove the formula as follows: The integral t”? = 


fro [rn es [Eo e7 dxi dx2...dxn where r = 
Vx? + x2 +.----+ x? follows from a formula in 
the text. Write dx; dx2...dx, = r”! drw where 
w is a particular closed (n — 1)-form. (See Exercise 
4, §8.3.) By Stokes’ Theorem the ‘volume’ of the 
unit ball is n7t times the integral of w over any 
(n — 1)-dimensional sphere r = const. Carrying out 
the integral of w in the formula for t”? it becomes a 
constant times fo e~r"! dr. Setting u = r?°, this 
number can be expressed in terms of the function [| 
and the desired formula follows. 


Chapter9 | Further Study of Limits 


Lebesgue Integration 


426 


13  Euler’s formula 
1-—3+5-74+9-1l1+:::=0 


is often held up to ridicule by people who are foolish enough 
to imagine that Euler had an inadequate grasp of the notion of 
convergence. Prove that the formula is true when it is inter- 
preted as an Abel sum. [To sum the series set y? = x, factor 


, 1 .. 
out y?, and integrate once. The function y + — has a minimum 
Y 
aty = 1.] 
14 The Cesaro sum of a series ay + a2 + a3 +°°° is 


defined to be the limit, if it exists, of the arithmetic means of 
the partial sums, that is, 


. I 
lim 4, [Si + S2 + S3 +: + Sy] 


N—- 00 


where 
Sy = ai + a@2+°°+ + an. 
When this limit exists the series is said to be Cesaro summable. 
(a) Show that 1—1+1-—-1+1-—1-4-°-:: is 
Cesaro summable and that its Cesaro sum is 3. 


(b) Prove that a convergent series is Cesaro summable 
and that its Cesaro sum is its ordinary sum. [Using 


1 , 
S =B tS +++- + S] this reduces to the 
statement that if x, is a sequence such that 
lim x, = 0 then lim $ [x1 + x2 +°: + xy] = 0, 


which is easily proved.] Show that the series of 
Exercise 13 is not Cesaro summable. 


9.7 


An ordinary integral f? f(x)dx is defined as a limit of 
sums 2 f(x:) Ax; formed by subdividing the (finite) 


2? 

interval {a < x < b} into small intervals, letting Ax; be 
the length of the ith interval of the subdivision, and 
letting x; be a point of the ith interval; if it is true that for 
every € > 0 there is a 6 > 0 such that the resulting sum 
is determined to within e whenever all Ax,’s are less than 
ô (regardless of the choice of subdivision and points x;) 
the integral is said to converge and the real number it 
determines is denoted f? f(x) dx. 


9.7 | Lebesgue Integration 


427 


There are, however, cases in which an integral 
Je f(x) dx does not converge but in which there is no 
doubt as to what the value of the integral ‘should’ be, 
for example the integral 


1 
(1) | | d 
—1 y |x| 


(The integrand is not defined at zero so, for the sake of 
definiteness, the value at O will be defined to be 0.) 
Because the integrand is unbounded, it is easily seen that 
the integral (1) does not converge. But if a small interval 
{—e€, <x <€} (€1, €2 small positive numbers) is 
excluded then the integral converges and its value is 
found to be 


e 
J £ [-2V fl dx + J, £ pvi] dx 


—2V e + 2 +2 — We 
4 — (Vei + Vez) 


by the Fundamental Theorem. Thus the integral ap- 
proaches the value 4 as the excluded interval is made 
small. Another way to ‘evaluate’ the non-convergent 
integral (1) would be to ‘cut off the top’, that is, to set 
fx(x) = min(|x|~1/?, K), to evaluate the convergent 
integral [, fx(x) dx, and to let K — oo. This gives 


_K72 | K? 1 i 
| dx + | Kdx + dx 
_K7-2 


—1 Vx! K= y |x| 
= —2K! +2 + K-2K? +2 — 2K! 
4 2 2 
=4-kKtKTÍTK 


and again as K — œ the limit is 4. Thus, even though the 
integral (1) is not convergent, its value ‘should’ be 4. 

As this example indicates, there is an extended defini- 
tion of fè f(x) dx which coincides with the definition 
already given for all convergent integrals f f(x)dx but 
which is valid for other integrals, for example (1), which 
are not convergent in the original definition. This 
extended definition is called the Lebesgue integral in 
contradistinction to the Riemann integral which is the 
definition given above. 


Chapter9 | Further Study of Limits 


*/n accordance with the previous 
chapters, the integrand should be 
considered to be a 1-form f(x) dx, 
not a function f(x). The distinction 
is of course purely pedantic and the 
common usage Is to consider the 
integrand as a function and to let dx 
represent the ‘measure’ used in 
defining JP f(x) dx, namely ‘oriented 
length’. 


+But not ‘only if’. See below. 


tNote the similarity of this theorem 
to the theorem of §6.3. The integral 
is defined by a process which 
involves choices, and it must be 
shown that the result is independent 
of these choices. 


428 


Theorem 


Lebesgue integration. A function* f(x) on an interval 
{a < x < b} is said to be Lebesgue integrable ift there 
exists a sequence A(x), Ao(x), A3(x),... of functions 
on {a < x < b} such that: 


(i) lim A,(x) = f(x) for all x in the interval 
{a < x <b}. 

(ii) The (Riemann) integrals fè An(x)dx converge 
(n = 1, 2,3,...), that is, the approximating sums 
approach a limit in the sense defined above. 


(iii) The lim fè |A,(x) — 


nvm— Ù 


Am(x)| dx = 0 is satisfied. More specifically, the 
integral fe |An(x) — An(x)| dx (which converges 
by dint of (11)) is less than any preassigned e for 
all sufficiently large n, m. 


Cauchy Criterion 


If f(x) is Lebesgue integrable then the definition 


b b 
| f(x) dx = lim | An(x) dx 
is valid, that is, the limit on the right exists and if B,(x) 


is any other sequence of functions satisfying the condi- 
tions (1)—(i11) then 


lim 


b b 
| B,(x) dx = | A, (x) dx. 


Examples and Applications 


lim 


nn 


a. The function of (1) is Lebesgue integrable (defining 
its value at 0 to be 0) as is shown by the sequence 


mi <x <1 
A,(x) = v |x| 
0 x< L, 


The integral f1; f(x) dx is therefore defined and is 
lim f1, An(x) dx = 4. If the sequence 


. l 
min { n, —— } x#0 
B,(x) = ( NAR 
0 x=0 


9.7 | Lebesgue Integration 


lim 


€1,€2—70 


429 


U 


is used instead, the value 


1 1 
| I(x) dx = lim | B,(x) dx = 4 
—1 no J —1 


is the same. The fact that the sequences A,,(x), B,(x) 
satisfy the Cauchy Criterion (iii) is immediately 
verified using the Fundamental Theorem. 


. If f(x) is Riemann integrable, that is, if PA) dx 


converges in the ordinary sense, then one can set 
An(x) = f(x) (alln). Thus f(x) is Lebesgue integrable 
and the new definition of f2f(x) dx coincides with 
the old one. This shows that the new definition is 
indeed an extension of the old one. 


c. The Lebesgue integral f4; |x|~? dx is 


1 
d a] | aj x=] |- 
a- a o daxl tral“ Ta 


provided a < 1. This is a simple extension of the 
first example. If a < O0 this integral converges as a 
Riemann integral. If a > 1 the function |x|~? is not 
Lebesgue integrable on {—1 < x < 1} (see Exer- 
cise 6). 


. Let f(x) be the function which is 1 if x is a rational 


number and Oif x is irrational. Then f(x) is Lebesgue 
integrable on {a < x < b} and PI dx = 0. 
This is seen by defining 


l if x is a rational number with a 
A,(x) = denominator < n 
0 otherwise. 


Then A, and |A, — Aml are zero except at a finite 
number of points (where they are 1) which implies 
that {? 4,(x) dx = 0 as a Riemann integral and 
that the Cauchy Criterion is satisfied. For any fixed 
x, lim A,(x) = f(x), hence |? f(x) dx = 0. 


. Let f(x) be the function which is zero if the decimal 


expansion of the real number x contains the digit 5 
and which is one otherwise. (If x has two decimal 
expansions—that is, if x is a terminating decimal 
fraction—then f(x) is ill-defined if one of these 
expansions has a 5 and the other does not, e.g. if 
x = .6 = 5999... . For the sake of definiteness 
f(x) will be defined to be one if either expansion has 
a 5, e.g. f(.6) = 1.) Then fè f(x) dx is defined (as a 


Chapter9 | Further Study of Limits 


430 


ret 


Lebesgue integral) and fd f(x) dx = 0. This can be 
proved by setting 


0 if the first n places of the decimal 
A,(x) = expansion of x contain a 5 
l if not 


(with A3(.260) = A3(.2599...) = 1, etc., as for 
f). Then A(x) is zero for {.5 < x < .6} and one 
elsewhere in {0 < x < 1}. Thus fè 4,(x) dx = 3. 
Since Ao(x) is zero on {.5 < x < .6} and on the 
nine intervals {.05 < x < .06},..., {95 < x < 
.96} omitting {.55 < x < .56} it follows in the 
same way that fo A(x) dx = #5. A similar 
argument shows that 


1 9 \n 
i A,(x) dx = (=) 


Thus fò An(x)dx 0 as n— œ, and moreover 
1 

| |An(x) ~ Am(x)| dx 
0 


1 1 
< | A,(x) dx + | Am(x) dx — 0 
0 0 


as well. Therefore the conditions (i)-(iii) are 
satisfied and fo f(x) dx = 0. In the theory of 
probability, this fact is stated, “The probability that 
a real number selected at random will have no 5’s 
in its decimal expansion is zero.” This accords with 
the common experience that the probability of 
rolling a die infinitely often without rolling a 5 1s 
zero. 


. Lebesgue integrals combine in all the same ways as 


ordinary integrals, e.g. 
b b b 
| (f(x) + g(x)) dx = J f(x) dx + | g(x) dx 


b b 
| cf(x) dx = e| f(x)dx (c = const.) 


[ If(x)| dx > J r dx 


b b b 
| max(f, g) dx > max( | f(x)dx, J g(x) ix) 


9.7 | Lebesgue Integration 


431 


etc., where the integrability of these functions is a 
consequence of the integrability of f and g. [The 
function max(f, g) assigns to each x the larger of 
the two numbers f(x), g(x).] These statements fol- 
low immediately from the analogous statements for 
ordinary Riemann integrals. The last identity above 
has the important consequence that 


f(x) 2 g(x) on {a < x < b) 
b b 
implies | f(x) dx = | g(x) dx. 
[f = g means max(f, g) = f.] 


Proof 


Given a sequence of Riemann integrable functions Ap, 
the Riemann integrability of |A» — Aml is easily proved. 
(The difference of Riemann integrable functions and the 
absolute value of a Riemann integrable function are 
Riemann integrable; see Exercise 1). If the Cauchy 
Criterion tim fè |An — Am| dx = Ois satisfied then the 


triangle inequality | An dx — fÈ Am dx| < Í (An — 
Am| dx implies that lim fè A, dx exists. The difficult 


part of the proof is to show that if B, is another sequence 
satisfying (i)—(ili) and if lim B,(x) = f(x) = lim An(x), 


then lim f? B,(x) dx = lim f? An(x) dx. Setting 


Cn (x) = B,(x) — An(x) 


the triangle inequality shows that C,(x) satisfies the 
Cauchy Criterion and the statement to be proved 
becomes: If C,,(x) is a sequence of Riemann integrable 
functions on {a < x < b} such that (i) lim C,(x) = 0 


for all x in {a < x < b}, and such that (ii) 
b 
lim | ICr(x) — Cyr(x)| dx = 0, 
then lim fÈ C,(x) dx = 0. Seen in this light, the theorem 
simply states that the interchange of limits 


b b 
lim | C(x) dx = | lim C,,(x) dx 


Nn N —>%W 


Chapter9 | Further Study of Limits 


*The inequality |C, — Cm| = 

[Ca] — |Cm|| shows that if Cn 
satisties the Cauchy Criterion then 
|C,| does. 


432 


is valid when lim {?|C,(x) — Cn(x)| dx = 0 and 


C,,(x) — 0 for all x in the interval. 

Note first that it suffices to consider the case C,,(x) > 0 
since the general case will then follow by considering* 
|C;,(x)|. Given a sequence C,,(x) of non-negative Riemann 
integrable functions such that lim C,(x) = 0 and such 


that lim fe|Cn(x) — Cm(x)| dx = 0, let D,,g denote 


the function min(C,, K) for K a constant. (If K < 0 then 
D,,K = K and only the cases K > 0 are of interest.) 
Intuitively, D,.x is ‘C, with the top cut off at K’. For 
each fixed K > Othe sequence D, x(x) (n = 1, 2, 3,...) 
is a Sequence of Riemann integrable functions. Moreover, 
lim Dy, x(x) = 0 for all x and | 


nD 


lim | | Da, K — Din, K| dx = 0. 


n,m—® 


Hence lim fD n, KO) dx exists and, if the theorem is 


true, this limit is zero for all K. 
Let Ix = lim |? [Ca — Dn,x] dx. Intuitively Ig mea- 


sures the amount of C, which, in the limit as n > œ, 
lies above K. If the theorem is true then 


b b 
Ig = im | C,, dx — im | D,gdx=0—0=0 
forall K > 0. Conversely, if it can be shown that Ix = 0 
for all K > 0, then lim fr C„ dx = lim {? Dn, dx < 


nV 


K(b — a) for all K > ‘0 and the desired conclusion 
lim f? Ca dx = 0 will follow. 


If K > K’ then Ig < Ig.. Moreover, for every e > 0 
there is a K such that Ix < e; this is proved by choosing 
N so large that fè IC, — Cyl dx < €/2 whenever n > N 
and by then choosing K so large that Cy(x) < K for all 
x in {a < x < b} (a Riemann integrable function is 
bounded). Then Cy = Dy,x hence 


b b 
| (C,, = Dyk) dx = | [(C,, — D, x) — (Cy = Dy g)] dx 


b b 
< | [Cn — Cy| dx + | |Dn,x — Dy,x| dx 
< 2- (€/2) = 


9.7 | Lebesgue Integration 


433 


for all n, hence Ig < e. It follows that if the desired 
conclusion (Ig = 0 for K > 0) is false then there must 
be positive numbers K, K’, c such that Ix, — Ig > c 
(K > K’). Now Ig — Ix = lim Í (Dax — Dn.x) dx, 


hence if this is the case there is an N such that 
fe (Dnk — Dna, x:)dx > c for n> N. The integral 
fè (Da,x — D,n,xg')dx is a limit of approximating sums 
> C,,(x;) Ax; where C,,(x) is zero if C,(x) < K’, C,(x) — 
K’ if K’ < C,(x) < K, and K — K’ if C,(x) > K. 
Given n > N let 6, be so small that any approximating 
sum to the integral is greater than c provided that the 
subdivision is finer than 6,, and let S, be a subdivision 
finer than 6,. Choose an x; in each interval of S, in such 
a way that C(x,) = 0 if this is possible, i.e. if the interval 
contains any point x; where C(x,;) < K! Let T, denote 
the set of intervals of S, where this is not possible, i.e. 
where all values of C,(x}—including the values at the 
end points—are at least K’. Since 


c < (K — K’)(total length of T,,) 


it follows that if f C,,(x) dx does not approach zero then 
there are positive numbers K’ and c’ = c/(K — K’) and 
an integer N such that for every n > N there is a finite 
collection of (closed) intervals T„ of total length greater 
than c’ on which C,(x) assumes no value less than K'. It 
is to be shown that this contradicts the assumptions 
C,(x) — 0 and fè IC, — Cm| dx —> 0. 

The sequence C,,(x) can be ‘weeded out’ so that the 
Cauchy Criterion takes the form 


b 
(2) | [C,.(x) — Crsz(x)| dx < 10™. 


One need only find an N such that f? |C, — Cyl dx < t 
for n > N, discard the first N — 1 functions in the se- 
quence, and renumber C, = C,_n4,; then find an N 
such that fè IC, — Cy| dx < 4o for n > N, discard 
Co, C3,..., Cn—1, and renumber. Continuing in this 
way, (2) is satisfied. 

Now since {? |Co(x) — Cy(x)| dx < jy there is a ô 
such that any approximating sum to this integral based 
on a subdivision finer than 6 is also less than 3/5. Consider 
the set where |C2(x) — C,(x)| > 4. Roughly speaking, 
this set cannot be large because the integral is small. 
Quantitatively, let S be a subdivision of {a < x < b} 


Chapter9 | Further Study of Limits 


434 


finer than 6, and let U, be the collection of intervals of S 
which contain points x where |C2(x) — C,(x)| > 4. By 
forming an approximating sum to fe |C2(x) — C,(x)| dx 
by choosing such points wherever possible, it follows 
that 


4 - (total length of intervals in U1) < 


hence the intervals of U, have total length less than é. 
Extending the intervals of U, slightly but keeping their 
total length less than 4, they can be made to include all 
points x where |C2(x) — C,(x)| > 4 in their interiors. 

By exactly the same argument, the set of all points x 
for which |C3(x) — C2(x)| > 27? can be contained in 
the interiors of a finite collection Uə of intervals whose 
total length is less than 2? - 107? = 5~?, and the set of 
all points x for which |C,,41(x) — C,(x)| > 27" can be 
contained in the interiors of a finite collection U, of 
intervals of total length less than 57”. 

Consider now the set where C,(x) > 1. Since 
lim C,(x) = 0 there must be an N such that 


n> 


ICi(x) — Cy(x)| > 1 
ICi — Co(x)| + |Co(x) — C3(x)| + +++ + Cnil) — Cu(X)| > 1 


*Because the lengths are positive, 
their total is well defined even 
though there are infinitely many of 
them. Very simply, to say that their 
total is less than 47! means that the 
total length of any finite number of 
the intervals is less than 47, 


tAgain meaning that the total length 
of any finite number of them is less 
than 47) . 51-n, 


hence there must be at least one n such that 
|Cn+1(X) ~~ n(x)| = 2”; 


that is, if C;(x) > 1 then x must be interior to an interval 
of Un for some n. Therefore the set where C(x) > 1 can 
be contained in the interiors of an infinite collection of 
intervals whose total length* is less than 57! + 57°? + 
573 +.-+- = 4, In the same way, if Co(x) > 4 then x 
must be interior to one of the intervals of U, for n > 2, 
and these intervals have total length less than 57? + 
573 4 5-44 +--+ = gb. Finally, the set of all x for 
which C,,(x) > 2~” can be contained in the interiors of an 
infinite collection of intervals (those of Un, Un41, Un+e, 
...) of total lengtht less than 47} 517”. 

It was shown above that if {? C,(x)dx does not 
approach zero as n — oo then there exist positive num- 
bers c’, K’ and an integer N such that for every n > N 
there is a finite collection T, of (closed) intervals of total 
length at least c’ on which all values of C,,(x) are > K’. 
Choose n so large that c’ > 47'-5'", K’ > 27, 
n > N, and let T, be as above. Every point of T, 1s 


9.7 | Lebesgue Integration 


435 


interior to at least one of the intervals of Un, Un41, 


U,42,.... Since T, is compact, the Heine-Borel 
Theorem states that it can be covered by a finite number 
of intervals of Un, Un+1, Un+2,..., and hence by a 


finite number of intervals whose total length is less than 
471. 51=» < ¢’, Buta finite number of intervals of total 
length >c’ cannot be covered by a finite number of 
intervals of total length <c’, which shows that the 
assumption lim f? C,(x) dx = 0 is untenable. Therefore 


nO 


lim f? C,(x) dx = 0 and the theorem is proved. 


nn 


An important strengthening of the theorem results 
from the observation that the proof just given 1s still 
valid even if there are points x in {a < x < b} where 
lim C,(x) # 0, provided that there are so ‘few’ of them 


NAD 


that they cannot account for the length c’ of the intervals 
T,, which result from the assumption that fè C,,(x) dx 
does not approach zero. For this purpose it suffices to 
assume that for every e > 0 there is a (possibly infinite) 
collection of intervals of total length less than e which 
contain in their interiors all points x where lim C,,(x) = 0 


n— 0 
is false. This is what it means to say that ‘C, (x) — 0 for 
almost all x’. 


Definition 


A sequence f;(x), fo(x), f3(x),... of functions on an 
interval {a < x < b} is said to converge to a function 
f(x) on {a < x < b} ‘for almost all x in fa < x < bY 
if the set of values of x for which lim f,(x) = f(x) is 


false can be contained in the interior of a collection of 
intervals of arbitrarily small total length. More precisely, 
‘lim f,(x) = f(x) for almost all x in {a < x < bY 


means that for every e > 0 there is a collection of sub- 
intervals of {a < x < b} such that (i) the total length of 
any finite number of intervals in the collection 1s less than 
e, and such that (ii) for any point x not interior to an 
interval of the collection, lim f,(x) = f(x). Similarly, 


two functions f(x), g(x) are said to be equal ‘for almost 
all x in {a < x < bY if the set of all values where 
f(x) ¥ g(x) can be contained in the interiors of a 
collection of subintervals of arbitrarily small total length. 


Chapter 9 | Further Study of Limits 


436 


The proof above then shows that the theorem is still 
true when the word ‘almost’ is inserted in condition (i). 
One need only note that if A,(x), B,(x) satisfy conditions 
(i)-(iii) then C, = A, — B, satisfies C,(x)— 0 for 
almost all x. The proof then proceeds exactly as before 
except that one must add to the collection of intervals 
Un, Un+1s Un+2, - . . a collection U» of arbitrarily small 
total length, outside which C, (x) — 0. Then every point 
x of T, is either interior to an interval of U» or 
lim C,(x) = 0, in which case it must be interior to an 


n— 00 


interval of Un, Un+1s Un+2,... . In this way T, can 
still be covered by a collection of intervals of arbitrarily 
small total length and the assumption lim f? C,(x) dx = 0 


can still be contradicted. This proves the following 
theorem. 


Theorem 


Lebesgue integration, revised statement. A function f(x) 
on the interval {a < x < b} is said to be Lebesgue 
integrable if (and only if) there exists a sequence A(x), 
A(x), A3(x), . . . of functions on {a < x < b} such that: 


(i) lim A,(x) = f(x) for almost all xin {a < x < b}. 
(ii) The Riemann integrals f A,(x)dx converge 
(n = 1,2,3,...). 
(i11) The Cauchy Criterion 


b 
lim | |A,(x) — Am(x)| dx = 0 


N,N OD 
is satisfied. 


When this is the case, the definition 


b b 
| f(x) dx = lim | A,,(x) dx 


is valid, that is, the limit on the right exists and depends 
only on f. 


Examples and Applications 


g. If f(x) is a function which is integrable in the first 
definition then it is integrable in the revised defini- 
tion and the two definitions of the integral agree. 
Thus the previous examples are all examples of 
Lebesgue integrable functions (revised definition). 


9.7 | Lebesgue Integration 


437 


Hereafter, ‘Lebesgue integrable’ will refer only to 
the revised definition. 


h. There ‘exist’ Lebesgue integrable functions which 
are not Lebesgue integrable in the first definition. 
The proof of this fact is highly non-constructive, 
however, and no simple example of such a function 
can be given. 


i. The revised definition simplifies the treatment of 
the integral (1). Setting A,(x) = min(n, |x| 71?) 
the conditions (i)}-(i11) are satisfied (even though 
lim A,(0) does not exist) no matter what value is 


n—> 0 


assigned to the integrand at x = 0. 


j. If f, g are two functions on {a < x < b}, if 
f(x) = g(x) for almost all x in the interval, and if 
f(x) is Lebesgue integrable, then g(x) is Lebesgue 
integrable and f g(x) dx = fè f(x)dx. One can 
prove directly that the functions f(x) of Examples 4 
and 5 above are equal to zero for almost all x and 
hence conclude that their integrals are zero (see 
Exercise 3). 


Apart from the desire to define fè f(x)dx for as many 
functions f(x) as possible, an important motivation of the 
theory of Lebesgue integration is the desire to establish 
the validity of the interchange of limits 


b b 
(2) lim | falx) dx = | (lim fa(x)) dx 


under the weakest possible assumptions on the sequence 
of functions f,,(x). As is shown by the examples a and d 
above, it is necessary to extend the definition of fè f(x) dx 
beyond the ordinary (Riemann) definition to avoid cases 
where the integral on the right side of (2) is not defined 
even though it ‘should’ be. The theory of Lebesgue 
integration makes possible the following amazingly 
general theorem on the interchange (2). 


Lebesgue Dominated 
Convergence Theorem 


The interchange (2) is valid whenever there is an in- 
tegrable function F(x) on {a < x < b} which ‘dom- 
inates’ the functions f,,(x) in the sense that for each n the 
inequality F(x) > |f,(x)| holds for almost all x. More 


Chapter9 | Further Study of Limits 


438 


precisely, if f1(x), fo(x), f3(x),..., and f(x) are func- 
tions on {a < x < b}, if lim f,(x) = f(x) for almost 


all x, if there is a function F(x) such that each f,(x) 
satisfies | f,(x)| < F(x) for almost all x, and if F(x) 
and f,(x), fo(x),... are all Lebesgue integrable on 
fa < x < b}, then f(x) is Lebesgue integrable on 
{fa < x < b} and 


b b 
lim | falx) dx = | fo(x) dx. 


nN — 0 


A proof of this theorem is given below. Note that the 
Lebesgue Dominated Convergence Theorem proves 
immediately that the interchange 


log(1 + x) 
l 
= J T4+1 dt 


-| [lim (1 — t + t? — P- + (A) dt 


0 n->0 


= im| [1 B= Ca] 


n0 


x? x? x 


=x-z tzo gt 
is valid for —1 < x < 1; it suffices to note that all 
l 
functions are dominated by F(t) = max (= , 1) and 


that this function is integrable on the interval in question. 

The generalization of Lebesgue integration to higher 
dimensions and to improper integrals is straightforward. 
For example, a function f(x, y) is said to be Lebesgue 
integrable on a rectangle R of the xy-plane if there is a 
sequence of functions A,,(x, y) such that: 


(i) lim A,(x, y) = f(x, y) for almost all points (x, y) 


in R. That is, for every e > 0 there is a (possibly 
infinite) collection of subrectangles of R whose 
total area is less than e and whose interiors contain 
all points where lim A,(x, y) = f(x, y) is false. 


(ii) The integrals fr A,(x,y)dxdy converge as 
ordinary (Riemann) integrals. 


9.7 | Lebesgue Integration 439 


*For reasons described in §9.6 this 
definition is usually used only when 
the integral is absolutely convergent, 
| that is, only when f f\f| dx dy 
converges as well. Improper integrals 
which converge but do not converge 
absolutely (called conditionally 
convergent improper integrals) must 
be handled very carefully, whether 
they be Riemann integrals or 
Lebesgue integrals. 


(i11) The Cauchy Criterion 


lim | |An(x, y) — Am(x, y)| dx dy = 0 
R 


n,m—>00 
is satisfied. 


When this is the case, the definition fr f(x, y) dx dy = 
lim fr An(x, y) dx dy is valid, as is proved by the same 


argument as in the one-dimensional case. 

A function f(x, y) is Lebesgue integrable* on the entire 
xy-plane if it is Lebesgue integrable on every rectangle 
and if fr f(x, y) dx dy approaches a limit as R becomes 
large; more precisely, the condition is that for every 
e€ > 0 there be a K such that the Lebesgue integrals 
fr f(x, y) dx dy, fre I(x, y) dx dy are defined and differ 
by less than e whenever R, R’ are rectangles containing 
the square {|x| < K, |y| < K}. The limiting value is in 
this case denoted {*., [2e f(x, y) dx dy. 

The Lebesgue Dominated Convergence Theorem also 
holds for higher dimensions and for improper integrals. 
(The proofs are easy extensions of the proof below.) 
Thus the interchange 


im f u” (1 — “) du -| uve“ du (x > —!) 
n=% J0 n 0 


of Exercise 11 of §9.6 is justified merely by setting 


z u\" 
falu) = (1-4) ifu< n 
0 ifu>n 


f.(u) = u*e~“, showing that the improper integral 
I fe(u) du converges (for x > —1), showing that 
falu) < f.(u) and showing that lim f,(u) = f.(u), all of 


n— 


which are elementary. Similarly the interchange 


j —t? j e A (x ty 
e cos xtdt = | lim e dt 
[. —%0 NO 2 (2 )! 
% n 2 2x 
= lim | By (XT) y 
n— o w 2 (2j)! 
` = | —t? 2j 
=h ay Lae tH 


of the last example of §9.6 is justified by showing that 


Chapter9 | Further Study of Limits 


440 


the functions involved are dominated by the integrable 
function F(t) = e~e'*4, The differentiation under the 
integral sign of this example is also justified easily using 
the Lebesgue theorem (Exercise 5). 

Proof of the Lebesgue Dominated Convergence Theorem. 
Replace f, by h = min(f,, F). Then f,(x) = h(x) for 
almost all x, hence lim f,(x) = f.(x) for almost all x 


(see Exercise 3). This shows that one can assume at the 
outset that fa(x) < F(x) for all x. Similarly one can 
assume that f,(x) > — F(x). 

Define a function g,(x) by 


(3) 8n = lim max(fr, fn4is-- +s nti): 
je 


[The limit on the right exists for all x because it is the 
limit of a non-decreasing sequence of real numbers 
bounded by F(x).] The functions max(fi, frais «+» sJn+3) 
are Lebesgue integrable hence, assuming that the theorem 
is true, g, is Lebesgue integrable and its integral is the 
limit of the integrals of max( fn, fn4i,--->fn4j). On 
the other hand, it follows from the definition of lim f,(x) 


that 
(4) f(x) = lim g(x) 


for almost all x. Thus, again assuming that the theorem 
is true, it follows that f is Lebesgue integrable and that 
fè f(x) dx = lim fè gn(x) dx. In particular, for every 


e€ > 0 the inequality 


b b b 
| foo(x) dx + € 2 | 2n(x) dx = | falx) dx 


holds for all sufficiently large n. 
Similarly the functions 


(3”) h, = lim MiIN(fny fa+1s -> fn+i) 
jo 


are Lebesgue integrable, 
(4’) f(x) = lim h,(x) 


for almost all x, and therefore f,,(x) is integrable and the 


9.7 | Lebesgue Integration 


441 


inequality 


b b 
| fo(x)dx — e < J fa(x) dx 


holds for all sufficiently large n. Thus lim fè f,(x) dx 


exists and is equal to ff f(x) dx as was to be shown. 

This shows that the theorem will be proved if it is 
proved for the sequences (3), (4), (3’), (4’). These 
sequences have the added property of being monotone, 
that is 


max(f1, fo, ose Jn) < max( fi, fo, e. sIn+j+1) 
En(x) > n(x), etc. 


for all x. Thus the general theorem is reduced to the case 
of monotone sequences. 

It suffices, therefore, to consider the case where the 
given sequence f,(x) of Lebesgue integrable functions 
| fn(x)| < F(x) satisfies the additional condition f,(x) < 
fn+i(x) for all n and x. (The case f,(x) > fn4i(%) is 
reduced to this case by multiplying by —1.) In this case 
the integrals are a bounded, increasing sequence 


b 


b b 
| pea < | poas | F(x) dx. 


Such a sequence is necessarily convergent, hence for 
every e there is an N such that 


S aoad- f ha <€ 


whenever n,m > N. But f(x) > fm(x) forn > m, hence 


b b 
J |F) — fn(x)| dx = J (ax) ~ fm(x)) dx 


and it follows that the sequence f,,(x) satisfies the Cauchy 
Criterion 


b 
lim | lfn(x) — falx) dx = 0. 


This reduces the theorem to the following theorem which 
is important in its own right. 


Chapter9 | Further Study of Limits 


442 


Theorem 


Completeness of the space of Lebesgue integrable functions. 
Let fi, fo, fz, ... be a sequence of Lebesgue integrable 
functions satisfying the Cauchy Criterion 


b 
lim | Ife — fal dx = 0. 


Then there is a Lebesgue integrable function g such that 
lim fè |fa — g| dx = 0. Moreover, if g is any Lebesgue 


integrable function with this property then lim f,,(x) = 
g(x) for almost all x in {a < x < b}. 

Using this theorem, the proof of the Lebesgue 
Dominated Convergence Theorem is completed by 


noting that the given monotone sequence f,(x) satisfies 
lim fa(x) = f(x) for almost all x (by assumption) and 


lim f,(x) = g(x) for almost all x (where g(x) is the 


Lebesgue integrable function whose existence is asserted 
by the Completeness Theorem). Thus f(x) = g(x) for 
almost all x, so f(x) is Lebesgue integrable and 
fe f(x) dx = fe g(x) dx. Since 


b b 
| has- | eax 


it follows that [?f.dx = f? gdx = lim f/f, dx as 


b 
< im | fn — gl dx = 0, 


lim 


N — Q 


desired. 

The proof of the completeness theorem resembles, as 
is to be expected, the proof of the completeness of the 
real number system (§9.1). First let the given sequence 
fa(x) be ‘weeded out’ so that 


b 
J | fn(x) — faal X)| dx < 107”. 


For each n let An, 1(X), An, 2(X), An,3(x), . . . be a sequence 
of Riemann integrable functions such that 


b 
| |An,j(x) — An, j+%(X)| dx < 107? 


and such that lim 4,,;(x) = f(x) for almost all x. (Such 
j— o 


9.7 | Lebesgue Integration 443 


b 
| |An n(x) T An+kn+K(X)| dx 


sequences A,, ; exist by the assumption that fn» is Lebesgue 
integrable.) The method of proof will be to show that the 
‘diagonal’ sequence A, 1(x), Ao,2(x), A3,3(X),... con- 
verges for almost all x and thereby defines a function g 
with the desired properties. 

To this end, note first that the diagonal sequence Ay» 
satisfies the Cauchy Criterion; in fact, 


b b 
< f [An n(x) = An n+j(X)| dx + i [An n(x) = An4k, n4) dx 


b 
+ J |An+kn+3(x) — Anpk,n+klX)| dx 


b 
< 107+ | Anas — Ansnat@dld + 10-4" 


\Ann(X) — An+én+e(x)| 


for j> k. As j— æ the middle term approaches 
SÈ | fn) — fa4z(x)| dx (by the definition of this inte- 
gral), hence for large j it is less than 2- 107” and 


b 
(5) | |An.n(x) — An+k nkl) dx < 4.107” 


for all k. | 
To show that lim A, n(x) exists for almost all x, 


consider the set where |An n(x) — An+ijn41(x)| > 27”. 
The argument used in the proof of the main theorem 
applied to the integral (5) shows that this set can be 
contained in the interior of a (finite) collection U, of 
intervals of total length at most 4-57”. Let V, denote 
the collection of all the intervals of Un, Un+1, Unys... . 
The total length of the intervals of V, is at most 
4-5-1 + 57! + 572 pee) = 4:5". (5) = 5. 57r, 
and if x is any point not in V, then 


< |Ansin4i(X) — An n(x) + ee + [Ank nyel X) — Anki, nte) 
< 2-71 + 277 4 277 4 Sha. 


Thus the Cauchy Criterion is satisfied and lim A, »(x) 


ne 

exists for all x not in Vp. Since V,, has arbitrarily small 

total length, this proves that lim A, n(x) exists for almost 
n—>2 


Chapter9 | Further Study of Limits 444 


all x. Let g(x) be the function which is lim A, n(x) 


when this limit exists and which is 0 otherwise. Then 
|An.n(x) — g(x)| < 2-27” whenever x is not in V,. 
By exactly the same argument applied to the sequence 
Án, 1s An,2)+++5An,j,---, it follows that lim A,,,(x) 
jJ—> o 


exists for almost all x and that there is a collection Wn, ; 
of intervals with total length less than (8) - 577 such that 


[An i) — lim An,j(x)| < 2-27? 
J —% 


whenever x is not in W,,;. Since lim A,,;(x) = fa(x) for 
jJ— 0 


almost all x, the set where this fails can be contained in a 
collection of intervals of arbitrarily small total length, 
say <(3)-5~’, which can be added to W,,,; to give a 
collection W,,; of total length <2 - 577 such that 


|An,j(X) — frlx)| <2: 277 


whenever x is not in W,, ;. 


Finally, let Z, be the collection of all intervals in 
Vans Wrens Wrstinet Wr42.n42) +++ . Then Zn has 
total length 


<5 5M 42.57 2.570) 4. HTD A... 
= 5.57” 4 2- ($): 57” < 10°57”. 


If x is not in Z, and if m > n then x is not in Vm and 
[fm(x) — 8| < |fin(x) — Am, mX) + |Amsm(x) — g| < 2:27 42:27 = 4.27 

so lim fm(x) exists and is equal to g(x). This proves that 

lim fa(x) = g(x) for almost all x. Moreover, the sequence 


An n(x) shows that g(x) is Lebesgue integrable and that 
b b b 
J |An n+ _ Antkn+kl dx < | An nj — An+knj| dx + | |An+kn4j — An+k,n-+k dx 
a a 


b 
$ J [Anni — Anthntyl dx + 107+? 
a 


forj > k. Hence as j > œ 


b b 
J fn — Ansi.nge| dx < | [fn — faril dx + 107" 


< 107” + 107+”), 


9.7 | Lebesgue Integration 445 
Letting k — œ gives 


b 
| [Ja — g| dx < 10™ 


hence as n —> œ% 


b 
im | |fa — g| dx = 0. 


If Z is any other function satisfying 


b 
im | lfn — Zldx=0 
then 


b b b 
| -zias | e-srlat | ipsa 


hence ff |g — g|dx= 0. It is to be shown that 
lim f,(x) = g(x) for almost all x, that is, that g(x) = 


2(x) for almost all x. The proof of the theorem will 
therefore be completed by the following lemma. 


Lemma 


If h is a Lebesgue integrable function such that h(x) > 0 
and such that f? h(x) dx = 0 then A(x) = 0 for almost 
all xin {a < x < b}. 


Proof 


Let A,,(x) be a sequence of Riemann integrable functions 
satisfying the Cauchy Criterion such that lim A,(x) = 


h(x) for almost all x. By taking |A,| it can be assumed 
that A, > 0. By assumption, lim fe A,(x) dx = 0. By 


‘weeding out’ the A’s it can be assumed that 


b 
| A(x) dx < 107. 


It follows that the set where A,(x) > 27” can be con- 
tained in the interiors of a finite collection U,, of intervals 
of total length <5~”. Let V, denote the collection of 
intervals in Un, Un+1, Un+o,... . If x is not in V, then 
it is not in U,, for m > n hence O < A,,(x) < 2~” and 


Chapter9 | Further Study of Limits 


Exercises 


446 


lim Am(x) = 0. Thus lim A,,(x) = 0 for almost all x 


ma m= > 


and A(x) = 0 for almost all x as was to be shown. 


The theory of Lebesgue integration can be formulated 
in many ways. For an especially clear exposition of two 
formulations different from the one given here—in- 
cluding Lebesgue’s original formulation in terms of the 
theory of measure—see B. Sz.-Nagy, Introduction to Real 
Functions and Orthogonal Expansions, Oxford University 
Press, 1965. 


1 Reviewing Chapter 2, show that a function f(x) on 
{a < x < b}is Riemann integrable—i.e. fy f(x) dx converges 
in the ordinary sense—if and only if the following condition is 
satisfied: For every e > 0 there is a 6 > O such that 


(*) > fxd — f(x) Ax; <e 

whenever {a < x < b}, is divided into subintervals whose 
lengths Ax; are all less than ô and whenever points x;, x; are 
chosen in each interval to form the sum (*). Conclude that 
the relations listed under application f are valid for Riemann 
integrable functions. 


2 Show that if f(x) = g(x) for almost all x and if g(x) = 
h(x) for almost all x then f(x) = A(x) for almost all x, where 
fa<x< bd}. 


3 Show that if f,(x) —> f(x) for almost all x and if f(x) = 
g(x) for almost all x then g,(x) — f.(x) for almost all x. 
[Use e = 6/2 + €/4+ €/8 +:--:.] Conclude that the 
function of Example d is zero for almost all x. 


4 Find an example of a sequence of continuous functions 
fr(x) on {0 < x < 1} which converge to zero fa(x) — 0 for 
all x in the interval but for which lim fù f,(x) dx # 0. 

n— W 


5 Use the Lebesgue Dominated Convergence Theorem to 
prove that the equations obtained by differentiation under the 
integral sign in the last example of §9.6 are valid. [The 
inequalities —x2 < n(e-*"/" — 1) < —x? + x4 can be used.] 


6 Show that |x|7° is not Lebesgue integrable on 
{—-1<x<1} 


for a > 1. [If it were, the Lebesgue Dominated Convergence 
Theorem could be contradicted.] 


9.8 | Banach Spaces 447 


Banach Spaces 


*See §4.5 for the definition of ‘vector 
space’. 


7 Letr = Vx? + y2. For what values of a is the function 
r~ Lebesgue integrable on {-1 <x <1, -Il<y< 1}? 


8 Let r= yx? + y2 + z2. For what values of a is the 
function r~* Lebesgue integrable on the cube 


{x| < 1, x| < 1, |z| < 1}? 


9 Prove the Lebesgue Dominated Convergence Theorem for 
improper integrals 


9 0] oe) 


lim fr dx = fa aX. 
[Use the theorem for finite intervals. If {*, F dx is within e 
of its limiting value then the same is true of any integral which 
it dominates. ] 


10 Prove Beppo Levi’s Theorem: If fisi(x) > fa(x) is a 
monotone sequence of Lebesgue integrable functions on 
{a < x < b} such that f? fa(x) dx is bounded for n = 1, 2, 
3,... then f,(x) converges for almost all x to a Lebesgue 
integrable function f(x). 


11 Prove Arzela’s Theorem: If a sequence of Riemann 

integrable functions is bounded on the interval {a < x < b} 

and if it converges to a limit function f which is also Riemann 

integrable on {a < x < b} then lim (?f, dx = fi fax. 
nD 


[Without the theory of Lebesgue integration this theorem is 
quite difficult; with it, it is trivial.] 


12 Prove Fatou’s Lemma: If fa > 0 is a sequence of non- 
negative Lebesgue integrable functions which converge 
falx) > f(x) for almost all x in {a < x < b} toa limit f(x), 
and if f? f,(x) dx < K for all n, then f is Lebesgue integrable 
and {2 f(x) dx < K. [Use the method of the proof of the 
Lebesgue Dominated Convergence Theorem and the fact that 
fa = 9.) 


9.8 


A norm on a vector space* is a function which assigns 
real numbers |x| to elements x of the vector space in such 
a way that: 


(i) |ax| = |a| |x| where |a| denotes the absolute value 
of the real number a and |x|, |ax| denote the norms 
of the elements x, ax of the vector space. 


Chapter9 | Further Study of Limits 


*The vector space of n-tuples of real 
numbers was denoted V, in §4.5. 
Actually there is a useful distinction 
between R and Va, which is the 
distinction between an affine space 
and a vector space; briefly this 
distinction ts that Vn has the ‘origin’ 
or ‘zero vector’ (0.0,..., O) asa 
special point but R" has no special 
points. This distinction will be 
ignored here and R” will denote the 
vector space. 


448 


(ii) Triangle inequality. The inequality |x + y| < 
|x| + |y| holds for all elements x, y of the vector 
space. 

(iii) |x — y| = 0 implies x = y. 


Intuitively the norm |x| is thought of as the length or size 
of x, and the norm |x — y| is thought of as the distance 
from x to y or the size of their difference. 

A Banach space is a vector space for which a norm 1s 
given, with respect to which the space is complete, that is, 
with respect to which every sequence x1, X2, X3,... Of 
elements of the space satisfying the Cauchy Criterion 

lim |X, —Xm| = 0 has a limit x, satisfying 


mneo 


lim |X, — Xel = 0. 


ND 

For x an n-tuple of real numbers x = (x1, X9,..., Xn) 
the notation |x| has often been used above to denote the 
maximum of the absolute values 


(1) |x| = max(|x;], [x2], ---. |Xnl). 


The properties (i)—(iii) and the completeness axiom hold, 
so that* R” with this norm is a Banach space. This 
Banach space has formed the basis of most of the 
theorems and proofs of this book. 

For any real number p > 1 the ‘p-norm’ 


Ixy = [xa]? + [xa]? + ++ + [xn]? 


also defines a Banach space structure on R”. The triangle 
inequality |x + ylp < |xlp + |ylp is Minkowski’s In- 
equality proved in Chapter 5 by the method of Lagrange 
multipliers (§5.4, Exercise 9). For p = 2 the p-norm is the 
Euclidean distance and |x + pyle < |xl2 + |yl2 is the 
ordinary triangle inequality. For any fixed x in R”, 


lim |x|, = |x| where |x| is the norm (1) (Exercise 7, §5.4). 


p70 

For this reason the norm (1) is also denoted |x|, to 
distinguish it from the other possible norms on R”. The 
proof that R” is complete in the norm |x|, is based on the 
simple inequalities 

(2) Ixo < [x]p < n™?|xle. 

The first inequality shows that if lim |x, — Xmlp = 0 
then lim |x, — Xml. = 0 hence there is an x» satisfying 
lim |x, — Xel» = 0 which implies, by the second in- 
equality, that lim |x, — xXalp = 0. 


9.8 | Banach Spaces 


449 


An example of an infinite-dimensional Banach space 
is provided by the space of all continuous functions x(t) 
defined on the interval {a < t < b} with the norm 


(3) [Xl = max [x(t)). 
aSt<b 


To say that x is ‘small’ in this norm, say |x|, < €, means 
that all values x(t) of x are less than e absolute value, i.e. 
x is uniformly small for all t in {a < t < b}. For this 
reason the norm (3) is called the ‘uniform norm’. The 
fact that the space of all continuous functions is complete 
with respect to the uniform norm (3) is the important 
theorem: A uniform limit of continuous functions is a 
continuous function. More specifically, if x1, x2, X3,... 
is a Sequence of continuous functions such that for every 
€ > 0 there is an N for which 


Xn — Xmlo = max |x,(t) — Xm(t)| < € 
ax<t<b 


whenever n,m > N, then the limit function x,.(t) (which 
clearly exists and satisfies lim |x, — Xel» = 0) is con- 


n0 


tinuous. This is proved simply by writing 


\Xo(t + h) — xa(t)| < |x0(t + h) — nlt + AD] + [x(t + A — nO) + lxnlt) — x.(2)| 


and noting that the first and last terms can be made 
small by making N large, after which the middle term 
(with N and ¢ fixed) can be made small by making |A] 
small; thus |x«(t + h) — x(t)| can be made small for 
fixed ¢ by making |A| small, which is the definition of 
continuity. 

The analogous generalization of the l-norm |x|, = 
Ixil + |xq| +--+: + |x| to functions x(t) would be 


b 
(4) Ix}, = | |x(t)| dt. 


It is easily shown that this is a norm on the vector space 
of continuous functions, but the continuous functions 
are not complete in this norm. On the vector space of 
Riemann integrable functions the function (4) is no 
longer a norm (condition (iii) is violated) and the space 
is still not complete. However, the function (4) is defined 
on the vector. space of Lebesgue integrable functions on 
{a < t < b} and is a norm provided that two Lebesgue 
integrable functions x(t), y(t) are considered to be ‘the 


Chapter9 | Further Study of Limits 


*Actually the theorems listed below 
depend on even less; they depend 
only on the affine space structure of 
R», That is (0,0,..., 0) plays no 
special role, except that a ‘k-form on 
E with values in F’ uses the vector 
space structure of F. 


450 


same’ whenever f? |x(t) — y(t)| dt = 0, that is, whenever 
x(t) = y(t) for almost all ¢. The completeness theorem 
of §9.7 shows that the Lebesgue integrable functions are 
complete in the norm (4) and hence are a Banach space. 
This Banach space is denoted Lf[a, b]. 

In a similar fashion, one can define the Banach space 
L[— æ, æ] (=space of functions x(t) for which the 
Lebesgue integral [®., |x(1)| dt is defined) and Banach 
spaces of Lebesgue integrable functions of several 
variables. The p-norm 


ip = (f kopa)” 


(1 < p < o) on the space of all functions x for which 
the Lebesgue integral fè |x(t)|? dt is defined can also be 
shown to satisfy the axioms of a Banach space. This 
Banach space is denoted L?[a, b]. 

All definitions and theorems concerning the Banach 
space R” with the norm |x|, = max(|xıl, |x9|,...5 [Xnl) 
can be extended immediately to arbitrary Banach spaces, 
provided that they depend only on the Banach space 
structure of R”, that is, provided that they depend only 
on the vector space operations,* on the properties (1)-(i1i) 
of the norm, and on completeness. For example: A 
function f: E — F which assigns elements of a Banach 
space F to elements of a Banach space E is said to be 
uniformly continuous on a subset X of E if it is defined for 
all xin Xand if for every e > Othereisa 6 > Osuch that 
f(x’) — f(x)| < e whenever x’, x are elements of X for 
which |x’ — x| < 6. (Here | f(x’) — f(x)| denotes the 
norm of the element f(x’) — f(x) of the Banach space 
F, and |x’ — x| the norm of x’ — x in E.) The closure X 
of a subset X of a Banach space E is the set of all elements 
of E which can be written as limits of sequences in X. If 
f: E- F is uniformly continuous on X, then the 
formula f(lim x,) = lim f(x») defines a uniformly con- 
tinuous extension of f to X. 

A function f: E— F from a Banach space E to a 
Banach space F is said to be uniformly differentiable on a 
subset X of E if there is a ô > O such that the function 

M, (h) = FETS) — LO) 


is defined and uniformly continuous on the set of triples 
(s, x, h) in which s is a real number 0 < |s| < ô, in which 


9.8 | Banach Spaces 451 


*/f E is a finite-dimensional Banach 
space then a linear map L: E —F is 
necessarily continuous (Exercise 20). 
If E is not finite-dimensional this is 
no longer the case. 


x is an element of X, and in which A is an element of E 
with |4| < 1. (The norm of a triple (s, x, h) is max(lsl, 
|x|, |A|). See Exercise 7.) When this is the case the exten- 
sion of M, ,(h) to the set s = O defines a function 


a LE SH) = SO 
(5) L,(h) = lim ; 


s—0 


with values in F. The function L,(h) is linear in A (see 
Exercise 4) and uniformly continuous in (x, h) relative 
to the norm max(|x|, |A|) for x in X and for all A. 

A function f: E — Fis said to be continuous on a set X 
if it is defined for all x in X and if for every x in X and 
e > 0 there is a 6 > O such that |f(x’) — f(x)| <€ 
whenever x’ is in X and |x’ — x| < 6. A function 
f: E — Fis said to be continuous at a point x if for every 
e€ > 0 there is a 6 > O such that f(x’) is defined and 
satisfies | f(x’) — f(x)| < e whenever x’ is in E and 
|x’ — x| < ô. 

A function f: E — F is said to be differentiable at a 
point x if there is a continuous* linear map L,: E — F 
with the property that for every e > 0 there is a 6 > 0 
such that f(x + sh) is defined and satisfies 


f(x + sh) = J) Eh <€ 
S 


whenever 0 < |s| < ôand |h| < 1. It is differentiable on 
a set X if it is differentiable at every point x of X. It is 
continuously differentiable on X if it is differentiable on X 
and if the derivative L, is a continuous function of x with 
values in the Banach space L(E, F) of Exercise 9. 

The notion of the ‘rank’ of a mapping f: E — Fis not 
defined in general and the statement of the Implicit 
Function Theorem must be revised and weakened 
somewhat: 


implicit Function Theorem 


Let f: Ey X Eg F be a mapping which assigns 
elements y = f(x1, x2) in a Banach space F to pairs of 
elements (x1, X2) with x; in a Banach space Ey and xə in 
a Banach space Es. Let f be continuously differentiable 
ona ‘cube’ {|x; — X1| < K,|xe — Xe| < K} containing 
(X1,X) and let L: Ey X E — F be its derivative at 
(X1, X2). Then y = f(x1, X2) can be solved locally for 


Chapter9 | Further Study of Limits 


Exercises 


452 


xı = gly, Xe) if and only if k = L(h, 0) can be solved for 
h = M(k). That is, if there is a continuous linear function 
M: F — E; such that the relations k = L(h,0) and h = 
M(k) are equivalent then there are an e > 0 and a con- 
tinuously differentiable map g: F X E — E, such that 
the relations y = f(x1, X2) and x; = g(y, X2) are defined 
and equivalent for all (x1, x2, y) satisfying |x; — Xıl < €, 
x2 — Xə| < €, ly — f(%1, X2)| < e. Conversely, if such 
€, g exist so must such an M. 


The proof is by successive approximations as before. 

A ‘continuous k-form w = A dx, dXxə...dXp + °° 
on R” with values in a Banach space F’ is simply a k-form 
in which the coefficient functions A are continuous 
functions A: R” — F rather than A: R” — R. Given 
such a k-form and given a compact, oriented, differenti- 
able k-manifold S in R”, an element fs w of F is defined, 
exactly as before, as a limit of approximating sums. 
Similarly, one can define the notions of a ‘continuous 
k-form w on a Banach space E with values in a Banach 
space F’ and of a ‘compact oriented k-manifold S in a 
Banach space E’ in such a way that fs w can be defined 
as before (Exercises 16-19). However, the domain of 
integration in these integrals 1s always a k-dimensional 
manifold—that is, essentially R*’—and integration over a 
general Banach space is not defined (because integration 
over domains of R? depends on more than just the vector 
space structure, the norm, and the completeness of R*). 

In summary, the abstract notion of a ‘Banach space’ 
gives a simple vocabulary for formulating many of the 
basic definitions and theorems of calculus in such a way 
that they are applicable under very genera] circumstances. 


1 Prove that a composition of continuous functions is 
continuous. That is, prove that if f: E —> F, g: F — G are 
functions (where FE, F, G are Banach spaces) if f is continuous 
at x and if g is continuous at f(x) then gof: E-G is 
continuous at X. 


2 Prove that if f: E— F and g: F— G are uniformly 
continuous on subsets X of E and Y of F, and if f(x) is in Y 
whenever x is in X, then go f: E — Gis uniformly continuous 
on xX. 


3 Prove the Chain Rule: If f: E — F is differentiable at a 
point X of E and if g: F — G is differentiable at f(x) then 


9.8 | Banach Spaces 


453 


goef: E—G is differentiable and its derivative mapping 
E — G is the composition of the derivatives of f and g. 


4 Prove that if f: E — F is uniformly differentiable on a set 
X then its derivative L,(h) is linear in A for each x. [The proof 
that L,(ah) = aL,(h) is easy. To prove that L(h + fh’) = 
L(h) + L(h’) show that 


lim f(x + sh + sk + tk) — f(x + sh + tk)— f(x + sk + tk) + f(x + tk) 


s,t0 


S 


exists and is independent of k. Then show that it is zero for 
all h, h’ and hence that L(h + h’) = L(A) + L(h’).] 


5 Prove that a linear map L: E — F is continuous if and 
only if there is a constant B such that |L(x)| < B|x|, in which 
case it is uniformly continuous. 


6 Use 5 to prove that if E is the Banach space R” with the 
norm |x|» then any linear map L: E — F is continuous. 


7 The product E, X Ex X +--+ X En of a finite number of 
Banach spaces £1, Eo,..., En is defined to be the set of 
n-tuples (x1, X2,...,Xn) in which x; is an element of E; 
(i = 1,2,...,n) with the vector space operations defined 
componentwise and with the norm defined by 


\(x1, XQ, 2085 Xn)| = max(|x1|, xol, ...3 [Xn|). 


Show that E1 X E2 X-::: X En is a Banach space. What is 
the Banach space R X R X--- X R? [Note that there is 
only one norm on the Banach space R.] 


8 Let L(R”, R”) denote the set of all linear maps R” — R”, 
i.e. the set of all m X n matrices M. Show that L(R”, R”) is a 
vector space and that the norm 
IM| = max |Mx|. 
zlo =1 
makes it a Banach space. Find the norm |M| as an explicit 
function of the matrix M. 


9 Let L(E, F) denote the set of all continuous linear maps 
M: E —> F where E, F are Banach spaces. Show that L(E, F) 
is a vector space, and show that 


\M| = 1.u.b. |Mxl 
lz|<1 


= least number B which fulfills the property of Exercise 5 
defines a norm on L(E, F) with respect to which L(E, F) is 
complete. Hence L(E, F) is a Banach space. 


10 Let Ebe the space R” with the p-norm |(x1,..., Xn)| = 
(x1? +: + [xat (p > 1). Show that the Banach 
space L(E, R) can be identified with R” in the g-norm where 


tilly 
| q 


Chapter 9 | Further Study of Limits 


*/f E is finite-dimensional then ML = I 
implies LM = I. This ts no longer true 
if E ts infinite-dimensional (see 
Exercise 21). 


454 


11 In the Banach space L(E, E) of continuous linear maps 
of a Banach space Æ to itself there is an operation of com- 
position assigning to a pair of maps Mı: E > E, Me: E- E 
a new map M20 Mı: E— E. Show that |M2 ° Mı| < 
[M2] |Mıl. 


12 An element L of the Banach space L(E, E) is said to be 
invertible if there is an element M of L(E, E) such that* 
ML = LM = I where / denotes the identity map. When this 
is the case, M is called the inverse of L and is denoted L—!. 
Show that if L is given and if M, N are given such that 
IML — I| < 1, |LN — I| < 1, then L is invertible. 


13 Refine the argument of 12 to show that the function 
‘inverse’ from L(E, E) to itself is continuous at all points where 
it is defined. That is, show that if L is invertible and if Lı 
is near L then L is invertible and Lī + is near L71. This fact 
is used in the following proof of the Inverse Function Theorem. 


14 Write a complete proof of the Inverse Function Theorem: 
Let f: E — F be a map from a Banach space E to a Banach 
space F, let ¥ be a point of E, let K be a real number such that 
fis continuously differentiable on the ‘cube’ {|x — X| < K} 
and let Lz: E — F be the derivative of fat x. If L;(h) = k can 
be solved for h as a continuous function of k then f(x) = y 
can be solved for x as a continuously differentiable function 
of y for all y sufficiently near f(X). More precisely, if there is a 
continuous map M: F — E such that M(k) = h is equivalent 
to k = Lz(h) (all A, k) then there is an e > 0 and a map 
g: F— E, defined and continuously differentiable on 
{ly — f(®)| < e}, such that y = f(x) is equivalent to g(y) = x 
for all x, y satisfying |x — X| < «e, |y — f(X)| < e. [Use the 
method of successive approximation to show that f(x) = y 
has a unique solution x for all y sufficiently near f(%) = Y. 
Call this function g. Show that g is differentiable at F with 
derivative M. Conclude that g is differentiable at all points 
where it is defined, hence that g is continuous. Finally, the 
derivative (Lu), being a composition of continuous 
functions, is continuous. ] 


15 Deduce the Implicit Function Theorem stated in the 
text from the Inverse Function Theorem proved above. 


16 A ‘constant k-form on E with values in F’ is a function 
6 EX EX:+:+X E-F of k-tuples of elements of E to 
elements (v1, V2, . . . , Vk) of F such that: (i) @ is continuous. 
(ii) @ is linear in each of its k variables, i.e. (avi + bv}, ve, 
U3, ..., Uk) = apy, V2,...,U%K) + bolti, V2,..., Uz), ete. 
(iii) @ is alternating, that is, interchanging two of the k vari- 
ables in @ changes the sign of its value in F. Show that a 
‘constant k-form on R” with values in R’ in the sense of §4.2 
gives rise to a ‘constant k-form on R” with values in R’ in the 
sense just defined. 


9.8 | Banach Spaces 


455 


17 Let A,(E; F) denote the set of all constant k-forms on E 
with values in F. Given a continuous linear map L: E1 — Eo 
define the pullback map L*: A,(E2; F) — A(Eı; F) in a 
manner consistent with the previous definition, and prove the 
Chain Rule (L o M)* = M* o [*. 


18 A ‘continuous k-form on E with values in F’ is a con- 
tinuous map E —> A,(E; F). Outline a definition of fs w for 
S a compact, oriented, k-dimensional manifold in £ (defining 
this concept) and for w a continuous k-form on E. 


19 Show that if |x| is any norm on a vector space V then 
|x| > 0 for all x, i.e. the axioms (i)-(iii) imply |x| > 0. 


20 Show that if |x| is any norm on R” then there exist posi- 
tive constants c, C such that c|x|. < |x| < C|x|~. Conclude 
that any linear map L: E — F in which the domain E is finite 
dimensional must be continuous. [The inequality |x| < 
C|x| is easy. The inequality c|x|. < |x| can be proved by 
considering (R")* = L(R”, R) (see Exercise 10, §4.5). The 
last statement then follows from Exercise 6.] 


21 Let E be the Banach space of all absolutely convergent 
series; that is, let E be the vector space of all infinite sequences 


x = (x1, X2, X3,...) of real numbers such that > |x;| con- 
i=] 

verges and let |x| denote the norm f` |x;| on E. Let L denote 

the operator ‘shift right’ defined by L(x1, x2, X3,...) = 

(0, x1, X2,...). Show that L is in L(E, E), and that there is 

an Min L(E, E)such that ML = IJ, but that L is not invertible. 


the Cauchy Criterion 


appendix 1 


The central idea of calculus is the idea of the conver- 
gence of an infinite process. Because the Cauchy Con- 
vergence Criterion is the precise formulation of this 
idea, a clear understanding of the Cauchy Criterion is 
fundamental to an understanding of calculus. The 
Cauchy Criterion is emphasized at several points in this 
book: in the definition of the definite integral (§2.3, 
§6.2, §9.5), in the discussion of the convergence of a 
method of successive approximations (Chapter 7), in 
the definition of real numbers (§9.1), in the definition of 
infinite products (§9.5), and in the definition of Lebesgue 
integrals (§9.7). In this appendix the idea of convergence 
is reviewed from a picturesque and non-technical point 
of view. 

Imagine that a number is to be determined by an 
experiment which involves some sort of apparatus. 
Imagine also that the apparatus is such that it can be 
set up with varying degrees of care and that different 
settings of the apparatus normally produce different 
results. The experiment can then be thought of as an 
infinite process, not because any single performance of 
the experiment is infinite, but because the apparatus 
can be set up in infinitely many different ways. The 
infinite process represented by the experiment 1s con- 
vergent if it is true that the resulting number can be 
determined to within any arbitrarily small margin for 


456 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhauser Classics, 
DOI 10.1007/978-0-8176-8412-9, © Harold M. Edwards 2014 


Appendix 1 | The Cauchy Criterion 


(1) 


457 


error by setting up the apparatus with sufficient care. 
This is the Cauchy Criterion. 

More fully, one should imagine that there is some way 
of describing the amount of care which has been used in 
setting up the apparatus in any one performance of the 
experiment. Convergence of the experiment then means 
that for any given margin for error there is a degree of 
care such that any two results of the experiment will differ 
by less than the given margin for error provided only that 
the apparatus is set up with at least the specified degree of 
care. 

Consider, for example, the determination of the num- 
ber r (see §7.5). The formula 


r= 16 — $@)° + 4@> — 4@) 4-1 


—4[zbs — bi)? + iei — Hobs)? +701 


gives a simple experiment for finding r. The number of 
decimal places which are retained in the calculations and 
the number of terms of the infinite series which are used 
represent the degree of care with which the experiment 
is performed. To say that this method converges means 
simply that if more and more decimal places and terms 
are retained then more and more decimal places of the 
answer will remain the same. This is the sense in which 
the formula (1) determines the real number m as the 
limit of a convergent infinite process. 
The definition 


r= | ded) 
D 


where D is the disc {x? + y? < 1} oriented dx dy, 
describes another experiment (convergent infinite pro- 
cess) for determining r. This experiment is examined in 
detail in Exercise 1, §2.3. 

It is worthwhile to observe that no physical experi- 
ment is convergent. That is, beyond a certain point any 
physically defined number becomes indeterminate. 
Examples of physical constants which can be determined 
to ten significant digits are rare, let alone constants 
which can be determined to a hundred significant digits 
or a thousand. Thus real numbers such as v2 and = are 
not at all ‘real’ in the fundamental sense of the word but, 
on the contrary, are prototypically ideal—that is, ‘real’ 
in the Platonic sense. 


appendix 2 


*See F. Cajori, A History of Mathe- 
matical Notations, Vol. Il, The Open 
Court Publishing Co., Chicago. 


tOn the level of elementary calculus 
the notation a is so useful that 
x 


only the most stubbornly ‘modern’ 
textbooks reject it completely. At 

the other end of the spectrum, on 
the level of differential geometry and 
advanced analysis, differential forms 
are very widely accepted. 


the Leibniz notation 


The notation of differential forms is a generalization of 
the Liebniz notation for calculus. Although there is a 
great deal of controversy as to whether Newton or 
Leibniz should be credited with the invention of calculus, 
there is no question that it was Leibniz who introduced 


d SERS 
the notations and | y(x) dx for the derivative and the 


x 
integral, and there is no question that this notation is 
superior to that of Newton.* Many historians of mathe- 
matics believe, in fact, that British mathematics was 
greatly impeded by the insistence of British mathemati- 
clans on using the Newtonian notation while German 
mathematics flourished using the notation of Leibniz. 
Nonetheless, the Leibniz notation has certain disad- 
vantages which have caused it to be somewhat neglected 
in recent years—particularly on the level of intermediate 
mathematics.f 

Perhaps the most serious defect of the Leibniz nota- 
tion is the fact that it leaves so much unsaid. For 
example, in writing 


one is actually saying, “‘y represents a dependent variable 
which depends in some unspecified or understood way 
on an independent variable x, and the derivative of this 


458 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhauser Classics, 


DOI 10.1007/978-0-8176-8412-9, © Harold M. Edwards 2014 


Appendix 2. | The Leibniz Notation 


459 


functional relation between y and x is the function 2x.” 
The general tendency of contemporary mathematics is 
to make all statements as explicit as possible and to 
avoid such modes of terminology as this in which a 
great deal of what is being said is left implicit. 

However, if one is concerned with computations and 
applications rather than with the abstract theory, then 
one is quickly led to the conclusion that the Leibniz 
notation is incomparably more efficient and that what 
it leaves unsaid is invariably either unimportant or is 
easily understood from the context. For example, the 
chain rule for functions of one variable 


dz _ dz dy 
(1) dx dy dx 


6 


could only mean: “z is a function of y, and y is a func- 
tion of x. When z is considered as a function of x (the 
composed function) its derivative is the product of the 


2. d d , ooa. , 
derivatives — and A . This derivative is a function of x, 
x 
Z. . 
and Ty is therefore to be considered as a function of x 
Y 


(by composition).” An alternative statement of (1) 
which is commonly found in contemporary calculus 
books is 


(2) (fo g)'(x) = Feo g’(x). 


This is a somewhat fuller statement of the chain rule 
than is (1), but the greater detail is bought at the cost 
of a great loss of clarity. One of the main objectives of 
this book has been to point out once again the superiority 
of the Leibniz notation. See, for example, the implicit 
differentiations in §5.2, the method of Lagrange multi- 
pliers in §5.4, and the integrability conditions of §8.6; 
each of these topics would be significantly more com- 
plicated without the Leibniz notation. 

A second disadvantage of the Leibniz notation which 
has caused it to be neglected in recent years is the fact 


. _.. d , 
that it expresses the derivative as a quotient of two 
x 


terms dx, dy which have not been defined. Is it indeed a 
quotient, and, if so, what are dx and dy? This question 
was central to the philosophical objections which were 
raised to the new calculus in the 17th and 18th centuries. 
In the 19th century the foundations of mathematics 


Appendix 2 | The Leibniz Notation 


*Many writers refer to the derivative 
d p. , o, 
s as the ‘differential coefficient of 

X 


y with respect to x. This terminology 
agrees perfectly with the viewpoint 


adopted here, namely that ov is the 


coefficient of dx in (3). 


460 


were profoundly reworked in order to establish calculus 
on a firm logical basis independent of vague notions of 
infinitesimals. This was done essentially by emphasizing 
the functional relationship, by writing y = f(x), and by 
writing the derivative as a new function f'(x). This 


a _ d 
relegated the Leibniz notation to the status of a con- 
x 


venient mnemonic device whose convenience was largely 
accidental and whose validity was highly suspect. 
The viewpoint adopted in this book is that the pull- 


i, . dy . 
back operation is of central importance and that — 1S 
x 


not a quotient but rather is the coefficient of dx in the 
pullback of dy under a function y = f(x). That is, 


d 
f'(x) = - means that the pullback of dy is f’(x) dx. At 


the cost of some precision but without grave danger of 
misunderstanding one can write 


(3) dy = f'(x) dx 

meaning “the pullback of dy is f’(x) dx.” This is the 
d 

meaning of the equation f'(x) = x .* More generally, 


if y is a function of several variables y = f(x1,Xo9,..., 
Xn) then (3) becomes 


or, as it has usually been written in this book, 


_ oy Oy tee OY 
dy = Jx, ax, + Jx, dxə + + ax, dXn. 


Here the “=” sign is not precise and one actually means 
that the right side is the pullback of dy under an under- 
stood function from (x1, X2,..-., Xn) to y. The meaning 
of the pullback operation, and hence the meaning of the 


derivative x , is discussed in detail in §5.3 (see in 
X 


particular formulas (3) and (4) of §5.3). 


on the foundations 
of mathematics 


appendix 3 


Many readers will be surprised by the fact that the dis- 
cussion of the real number system in §9.1 contains no 
definition of ‘number’—not even of ‘natural number’. 
However, there should actually be nothing at all sur- 
prising about this. In defining any term, one must make 
use of other terms; if the definition is to be useful, then 
the terms used in the definition must be more simple 
and more familiar than the term being defined. But 
what could be more simple and more familiar than the 
idea of ‘natural number’? Hence, how could ‘natural 
number’ possibly be defined? 

There are various schools of thought concerning the 
foundations of mathematics and many questions are 
vigorously debated. In particular, the point of view that 
‘natural number’ cannot be defined would be contested 
by many mathematicians who would maintain that the 
concept of ‘set’ is more primitive than that of ‘number’ 
and who would use it to define ‘number’. Others would 
contend that the idea of ‘set’ is not at all intuitive and 
would contend that, in particular, the idea of an infinite 
set is very nebulous. They would consider a definition 
of ‘number’ in terms of sets to be an absurdity because it 
uses a difficult and perhaps meaningless concept to define 
a simple one. In short, they would contend that anyone 


461 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhauser Classics, 
DOI 10.1007/978-0-8176-8412-9, © Harold M. Edwards 2014 


Appendix 3 | On the Foundations of Mathematics 462 


who does not understand the meaning of ‘natural number’ 
has no chance of understanding the meaning of ‘set’. 

The point of view adopted in §9.1 (and adopted in 
§4.2 and §5.2 where analogous questions arise) is that 
mathematics is active, deductive, and computational, 
and that the important thing in mathematics is to define 
the logical and computational relationship between the 
terms employed. The terms themselves are inert and 
devoid of meaning other than the meaning imparted to 
them by the way in which they are used in active relation- 
ship with other terms. 


constructive 
mathematics 


appendix 4 


In the opinion of many mathematicians, the theorems 
of §9.4 are logically unacceptable because they are non- 
constructive existence theorems, that is, theorems which 
assert that something or other ‘exists’ without telling how 
to find it explicitly. These mathematicians hold that it is 
pointless to say that something ‘exists’ if there is no 
way of finding it. For example, they hold that it is 
pointless to assert that an infinite sequence in a compact 
set has a point of accumulation (the Bolzano-Weier- 
strass Theorem) because there may be no way whatsoever 
of finding a point of accumulation. Either a point of 
accumulation can actually be found (in which case the 
theorem can be improved on) or there is no way to find 
a point of accumulation (in which case the theorem is 
futile). 

If one adopts this constructive view of mathematical 
existence then several of the theorems of this book must 
be modified (and the theorems of §9.4 must be rejected 
altogether). However, the modifications are not as exten- 
sive as one might at first imagine, and the useful theorems 
of calculus survive it intact. In fact, a careful, construc- 
tive restatement of the theorems of calculus clarifies 
them and heightens their usefulness. 

The proof that “the integral Sr A(x, y) dx dy of a 
continuous 2-form A dx dy over a rectangle R of the 
xy-plane converges” (§2.3 and §6.3) used the theorem 
that a continuous function on R is necessarily uniformly 


463 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhauser Classics, 
DOI 10.1007/978-0-8176-8412-9, © Harold M. Edwards 2014 


Appendix 4 | Constructive Mathematics 


*/n the constructive sense that given 
e a corresponding 6 can be found 
explicitly which satisfies the 
condition which defines uniform 
differentiability in §9.3. 


tHowever, the alternative proof 
given in the exercise of §5.3 is non- 
constructive. 


464 


continuous on R (Theorem 1, §9.4). From the construc- 
tivist point of view it is necessary to assume not only 
that the integrand A dx dy is uniformly continuous but 
also that for any given e one can explicitly find a 6 such 
that |A(x, y) — A(X, ¥)| < e whenever |x — X| < ô, 
ly — F| < ô. This assumption is necessary in order to 
be able to state explicitly how fre A dx dy can be com- 
puted with any prescribed degree of accuracy in a finite 
number of steps. This additional assumption on A(x, y) 
is perfectly natural from the constructivist point of view 
because the very evaluation of A(x, y) virtually requires 
that given € a corresponding ô can be found (see §9.3). 
That is, if A(x, y) does not satisfy the additional assump- 
tion, then from the constructivist point of view the 
integrand A(x, y) is not even a well-defined function; 
hence the integral fp A dx dy is without meaning and 
there is no point in discussing its convergence. 

Similarly, from the constructivist point of view the 
Fundamental Theorem F(b) — F(a) = fe F’(t) dt would 
be considered only in the case where F(t) is uniformly 
differentiable on the interval {a < t < b}.* In this case 
it is easily shown that for every e > 0 there is an 
(explicitly constructible) 6 > 0 such that any approxi- 
mating sum J` F’(t;)At; to f? F’(f) dt differs by less 
than e from F(b) — F(a) whenever all At; are less than 6 
(see Exercise 4, §9.3). 

Similarly, the statement of Stokes’ Theorem can be 
amplified so that it becomes constructively true. The 
proof of the Implicit Function Theorem given in §7.1 is 
constructive,f so that this theorem is perfectly accept- 
able from the constructivist point of view. However, not 
every theorem can be interpreted constructively. A very 
surprising exception is the trichotomy law of §9.1, that is, 
the ‘law’ that every real number is either positive, negative, 
or zero. The following example shows that this ‘law’ is 
not entirely self-evident: 

The so-called Goldbach conjecture states that every 
even number greater than 4 can be written as the sum 
of two prime numbers. So far as is known, this conjecture 
is true. For example, 


Appendix 4 | Constructive Mathematics 465 


For larger numbers there are more possibilities so that 
on a purely empirical basis the Goldbach conjecture 
appears very, very likely once it has been tested for low 
numbers. It has in fact been tested extensively by com- 
puter and no exception has ever been found. Consider 


now the real number r = .ayaoa3Q4°°° = — + 


a3 
1000 + defined by 


0 if 2n + 4 can be written as the 
An = sum of two primes, 
l otherwise. 


The Goldbach conjecture is that r = 0. 

Millions of decimal places of r are known, and they 
are all zero. However, in order to prove r = Q it is 
necessary to prove the Goldbach conjecture, and in 
order to prove r > Q0 it is necessary to disprove the 
Goldbach conjecture. But it is quite conceivable that 
human (or inhuman) intelligence will never succeed 
either in proving or in disproving the Goldbach conjec- 
ture. Thus it may be that neither the statement r = 0 
nor the statement r > 0 will ever be proved. The con- 
structivist position is that it is pointless to assert, as 
the trichotomy law does, that either r = 0 or r > 0. 
What one means is simply that the statements r = 0 
and r > 0 are contradictory, that is, that both cannot 
be true. To put this statement in the form of the tri- 
chotomy law gives the mistaken impression that the 
Goldbach conjecture necessarily can be resolved one 
way or the other. 

What is involved is the so-called law of the excluded 
middle. If one proves that the denial of a statement is 
false, is one justified in concluding that the statement is 
true? Surprisingly enough, the answer is “no” if all 
statements are interpreted constructively. One might 
conceivably prove, for example, that the assumption 
r = Qleads to a contradiction without being able to 
prove constructively that r > 0. Similarly, if xo, X1, X2, 
X3, ... 1S some given sequence of points in the interval 
{0 < x < 1} it is quite conceivable that one could 
succeed in proving that at most a finite number of the 
Xn lie in {4 < x < 1} without being able to prove 
constructively that an infinite number of the x, lie in 
{0 < x < 4}. In §9.4 the statement “there exist at 
most a finite number of the x, in {4 < x < 1}” was 


Appendix 4 | Constructive Mathematics 466 


taken to imply “there exist an infinite number of the 
Xn in {0 < x < 4}.” In the constructivist view, how- 
ever, mathematical existence means constructibility and 
this implication is invalid. It is for this reason that the 
proofs of §9.4 are not acceptable from the constructivist 
point of view. 


the parable of the 
logician and the 
carpenter 


appendix 5 


The following tale was told, without further explanation, 
by a man who had spent many years contemplating the 
real number system: 


Once upon a time, a Logician, seeking respite from 
the hurly-burly of academic life, came to pass a period 
of time in the country. Being in need of a desk to con- 
tinue his researches, he sought out a Carpenter in the 
neighboring village and asked whether the Carpenter 
might build him a desk. 

“You give me the proper specifications,” said the 
Carpenter, “and Ill build you any kind of a desk you 
want.” 

“Heavens!” exclaimed the Logician, “do you realize 
that you just made a statement about the set of all 
possible desks? Do you have any conception of the 
logical complexities inherent in such a statement?” 

“Nonsense,” said the Carpenter. “I have no idea what 
a set is.” He paused. “Come to think of it, Pm not sure 
just what you mean by a desk. I’m talking about car- 
pentry. You give me the specifications and I'll build you 
the desk.” 

“How quaint,” the Logician said to Mrs. Logician 
that night at home. “The chap was quite ignorant of 
even the rudiments of formal thought. Didn’t know what 
a set was. However, I expect he’ll build me a passable 
desk.” 

And indeed he did. 


467 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhäuser Classics, 
DOI 10.1007/978-0-8176-8412-9, © Harold M. Edwards 2014 


answers to exercises 


§1.1 pages 4-5 


1 (a)2 (b) —33 (c) —14 

2 (a) —13 (b) 8ax — 11ldy 

3 (a) The flow is zero across any segment PQ 
of the form P = (x,y), Q = (x + 2t,y + 3t) 
where ź is an arbitrary number. Therefore the 
flow lines are lines of the form x = const. + 2t, 
y = const. + 3¢ or, what is the same, lines of 
the form 3x — 2y = const. Since the flow across 
the segment from (0, 0) to (0, 1) is —2 the flow 
must be from upper right to lower left when the 
axes are drawn in the usual way. (b) Flow is 
along lines x + y = const. from upper left to 
lower right. (c) Flow is along lines Ax + By = 
const. A particle at (0,0) passes across points 
(Bt, — At) ‘where ¢ is a positive number. 

4 łdx + dy + 13a 

5 Points on the plane 3x + 4y — z = 0 can 
be reached without work. The direction of the 
force is perpendicular to this plane. The di- 
rected line segment (vector) from (0, 0,0) to 
(—3, —4, 1) indicates the direction of the force. 
6 (a) Draw the arrow from (0, 0) to (2, —3) 
or any translate of this arrow. (b) Draw the 
arrow from (0, 0) to (—3, —2) or any translate. 
7 Set P; = (xi, y;). The desired formula is then 
doizr [AQ — x1) + BOL — yi1) + 
C(z; — zi—1)] = A(Xn — xo) + B(yn — Yo) + 
C(Z, — Zo) which follows from simple algebraic 
cancellation. 


§1.2 page 7 
1 —33 


2 —9. The oriented area of the projection on 
the xy-plane can be found by observing that it is 


468 


the same as the oriented area of the triangle 
(1, 1), (3, 5), (3, 0). The other two can be found 
by a similar method. 


3 The oriented area is positive if and only if 
X12 — x2yı > 0. This can be seen from the 
fact that the oriented area of (0,0), (x1, y1), 
(—y1, x1) is positive. 

4 The oriented area is positive if and only if 
Xiy2 + x2yv3 + xX3¥1 — e — X3y2 — 
x1y3 > 0. Interchanging two vertices inter- 
changes positive and negative terms. 


$1.3 pages 13-15 


1 The pullback of dxdy + 3dxdz under 
x= u +v, y = 2u + 4v, z = 3u is —7 du dv. 
The pullback of dy dz + dz dx + dx dy under 
x = | + 2u + 3v, y = 1 + 4u +o z=2— 
3u — v is —18 du dv. The pullback of dx dy 
under x = xyu+ x20, y = yıu + you iS 
(xıy2 — x2y1) du dv. 

2 Self-checking 

3 (a) —15 (b) —45 (c) —35 

4 (a)$(xiye—xey1) (b) d(xoy1 — x1Y0) + 
3(x1y2 — x2y1) + $(x2yo — xoy2) (c) Total 
flow is zero. The three terms of (b) correspond 
to the three sides which touch S, and the original 
triangle POR is the fourth side. (d) ‘No’ for 
ordinary area, ‘yes’ for oriented area. 


5 (a) $(xoy1 — x1¥0) + (xıy2 — x2y1ı) + 
4(xoyv3 — x3y2) + $(x3avo — xoy3). (b) If 
the vertices are (x: y:i) for i= 1,2,...,n 
(in that order) then the oriented area is 
Ate (xi—1yi — Xyi—1ı) where (Xn, Yn) = 
(xo, yo) by definition. (c) If P = (xp, yp, Zp), 
Q = (xg, Yo, zQ) then the edge PQ contributes a 
term 3(xpyg — xọyp) to the value of dx dy on 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhauser Classics, 


DOI 10.1007/978-0-8176-8412-9, © Harold M. Edwards 2014 


Answers to Exercises | pages 23-24 469 


one polygon and minus this amount to the value 
on another. Hence all terms cancel. (d) dy dz, 
dz dx are just like dx dy. (e) Given two sur- 
faces with the same boundary, reversing the ori- 
entation of one gives a closed surface and the 
desired result follows from (d). 


6 Flow is in the direction of the line segment 
from (0, 0,0) to (3, —7, 11). The pullback of 
the given 2-form under a map of the form 
x = 3u + xa, y = —Tu + yov,z = Ilu + zov 
1S Zero. 

7 (e) The pullback is — dx dy. Hence the map 
reverses orientations. This is easily shown by 
drawings. 


8 The general case of this argument occurs in 
Exercise 3, §4.3, where it is simplified by matrix 
notation. 


$1.4 pages 18-19 


1 The composed map is x = 2(r + 3s) + 
Qs+ nh) =2r+ 88+ t,y = 3r+ lls +t, 
z = 2r + 4s + 2thence dx dy dz = (2dr + 
8 ds + dt)(3dr + 11 ds + dt)\(2dr + 4ds + 
2dt) = [(22 — 24)ards + (2 — 3)drdt + 
(8 — ll) dsdt}(Qdr + 4ds + 2dt) = [-4 + 
4 — 6] dr ds dt = —6 dr ds dt. The pullback of 
du dv dw under the first map is 6 dr ds dt, and 
the pullback of dx dy dz under the second map is 
—du dv dw, hence the pullback of the pullback 
is —6 dr ds dt, as was to be shown. 

2 Self-checking 


3 (a) Counterclockwise (b) Clockwise 

(c) Collinear (d) Counterclockwise 

4 (a) Right-handed (b) Right-handed 

(c) Coplanar 

5 The pullback of dx dy dz under x = u + 
v+ w, y=v+w, z=w is dududw. The 
tetrahedron with vertices (0,0,0), (1,0,0), 
(1, 1, 0), (1, 1, 1) is described by the inequalities 
{0 <x <y <z < 1}. The unit cube is divided 
into six such tetrahedra, one for each order of 
the coordinates x, y, Z. 


6 (a) 32 (b)1 (0 
7 By giving two non-coincident points PoP}. 


$1.5 page 27 


1 3du+ 4dv, 23 du + 2 dv, 3 dv, — 51 du dv, 
— 183 du dv. 


2 —3dx + 11ldy + 27 dz, Tdy + 15 dz, 
—8 dy dz + 15dzdx — 7 dx dy, —32 dy dz + 
60 dz dx — 28 dx dy, —24 dy dz + 45 dz dx — 
21 dx dy. 


3 The work done between ¢ = 0 and ¢ = 3 is 
the work required to go from (0,0, 4) to 
(9, 3, 13) which is (3-9) — (2-3) + (2:9) = 
39. In general the work required is 3 dx — 
2 dy + 2 dz = (3-3dt)— (2: d + (2 ; 3 dt) = 
13 dt (the pullback), that is, the work required is 
26 in the second time interval, and 26 in the 
third. 


4 A k-form in n variables has (;) components, 
where (%) denotes the binomial coefficient 
(see §4.2). 

5 The amount of fluid which crosses the seg- 
ment (xo, yo), (x1, yı) in unit time is the amount 
of fluid in the parallelogram (xo, yo), (xo + A, 
yo + B), (xı + A, yı + B), (xı, y1). The flow 
is + if this parallelogram is oriented counter- 
clockwise, — otherwise. Hence the flow across 
the segment from (xo, yo) to (x1, y1) is the ori- 
ented area of this parallelogram, which is 
A(y1 — yo) — B(xı — xo). Thus ‘flow across’ 
is the 1-form A dy — B dx. 

6 The flow across a parallelogram (xo, yo, Zo), 
(x1, Y1, Z1), (x2 + xı — xo, y2 + yı — Yo, 
Z2 + Z1 — Zo), (x2, Y2, Z2) is equal in magnitude 
to the volume of the parallelepiped generated by 
the parallelogram and the segment (xo, yo, Zo), 
(xo + A, yo + B, zo + C)asin 5. Its sign is + 
or — depending on whether the orientation 
(xo, Yo, Zo), (x1, Y1, Z1), x2, Y2, Z2), (xo + A, 
yo + B, zo + C) is right- or left-handed. Thus 
it is the coefficient of du dv dw in the pullback of 
dx dy dz under 


x = xo + (xı — xou + (x2 — xov + Aw 
y = yo + (yı — yoju + (y2 — you + Bw 
z = Zo + (zı — Zou + (Ze — zov + Cw. 


This is A[(yı — yo)(z2 — zo) — (v2 — yo) X 
(zı — Zo)] + °°: which is the value of A dy dz + 
B dz dx + C dx dy on the given parallelogram. 
Thus this 2-form describes the flow. 


§2.1 pages 23-24 


1 Ifthe constant 1-form A dx + B dy + C dz 
describes ‘work’ then the magnitude of the force 
is VA? + B2 + C? because a unit displace- 
ment in the direction opposing the force 
is from (0,0,0) to (A/V A* + B? + C°, 
B/\/A2 + B? + C, C/VA? + B? + C3) 
which requires an amount of work equal to 
(A2 + B2 + C2)// A? + B2 + C2 — 
/A2 + B2 + C2. To say that the force at 
(x, y, z) is radially outward means that it is of 
the form cx dx + cy dy + cz dz where c > 0. 


Answers to Exercises | pages 27-28 470 


To say that its magnitude is proportional to 1/r? 
means ~/(cx)? + (cy)? + (cz)? = k/r? for 
some positive constant k, hence c = k/r?, 
q.e. d. 


2 If the constant 1-form Ady — B dx de- 
scribes a planar flow then the magnitude of the 
flow is VA? + B?. To say that the flow at 
(x, y) is radially outward means it is of the form 
cx dy — cy dx. To say its magnitude is k/r 
means v (ex)? + (cy)? = k/r, c = k/r? and 
the 1-form is k(x dy — y dx)/r?. 

3 By Exercise 6, §1.5 the flow is c[x dy dz + 
y dz dx + z dx dy]. The constant c is to be de- 
termined by the condition that the magnitude of 
the flow is proportional to 1/r?, say k/r?. If 
y = z = 0 the magnitude of the flow is cx, 
hence c = k/xr? = k/r® in this case. By anal- 
ogy with Exercise 2 this leads to the enlightened 
guess k(x dy dz + y dz dx + zdxdy)/r® (see 
Exercise 2, §8.3). 


4 Flow from a source at 0 is the function 
sen x = x/Ļ/x?. A 0-form is a function, and a 
constant 0-form is a number. 


§2.2 pages 27—28 


1 $ (n) = —4n[(QQn + 1)-? + (2n + 3)77 + 
-++ + (4n — 1)~?]. The term (25/20)—2(1/10) 
of ` (10) corresponds to the interval {12/10 < 
x < 13/10} and hence to the two terms 
(49/40)-*(1/20) + (51/40)-2(1/20) of $- (20). 
The difference is estimated, using the given in- 
equality, to be at most 


|(25/20)~2(1/20) — (49/40)-?(1/20)| 
+ |(25/20)-?(1/20) — (51/40)~7(1/20)| 
< (1/20) + 2+ (1/40) + (1/20) - 2 - (1/40) 
= 1/200. 


Applying the same estimate on each of the ten 
intervals of $ (10) gives |} (10) — >5(20)| < 
1/20 = .05. The resulting numbers differ by at 
most .05, i.e., by +5 in the last place. A similar 
argument (10 = N,m = 2) gives | (N) — 
> (mN) < N: m: (1/mN): 2: (1/2N) = 1/N. 
Thus N = 200 has the desired property. If 
n > 200 then |$ (n) — $} (200| < Èn) — 
¥(200n)| + |Z 200a) — Z200) < A/n) + 
(1/200) < .01. Therefore È (n) rounded to two 
places differs from ł (200) rounded to two 
places by at most 1 in the last place. 

2 (a) By the definition of rounding, |a — a3| < 
.0005. Thus [a3 — t3| < Jag — a| + la — t| + 
IZ — t3| < .0005 + .001 + .0005 = .002. But 


a3, t3 are integer multiples of .001 and two in- 
tegers which differ by less than 2 differ by at 
most +1. (b) Take a = .0004999999 and 
t = 0005000001. 


3 The argument of 1 gives (xj! — xz !)/ 
(xı — x2) > —1, [1/x1 — 1/xə| < |xı — xəl. 
Then |X (N) — È (mN)| < N: m: (1/mN) - 
(1/2N) = (1/2N). Therefore N = 1,000 has the 
desired property. 


4 The vertices of the n-gon lie at (cos 2rj/n, 
sin 2rj/n) (j = 1,2,..., n). Between the jth 
and (j + 1)st vertices theterm in the approxi- 
mating sum is cos (2r(j + 4)/n)[sin 2r(j + 1) 
/n — sin (2nj/n)] — sin Rr(j + 4)/n) X 
[cos (2r(j + 1)/n) — cos (2rj/n)] = cos (2r(j 
+ 2)/n) + 2 cos (2r(j + 3)/n sin (m/n) — 
sin (2r(j + §)/n) : (—2) + sin (2r(j + 43)/n) 
sin (m/n) = 2 sin (r/n). There are n such terms, 
hence 2n sin (r/n) in all. As n — œ the limit 
1S 27. Í 


5 As in 1 and 3, one must estimate |x?y? — 
x3y3|. For example the estimate |x?y? — 
xy} + xyi — x3yəl < yix — 28) + 
x51¥5 — Yi] = vilxa — x| |x? + xıx2 + x3| + 
xsly2 — yil ly2 + yal < 12 [x1 — xe] + 
Aly; — ya] can be used. Then |X} (Nn, Mm) — 
$ (n, m)| < 2- [12+ 2071+ 4: m™!] = (12/n) 
+ (8/m). From this it can be shown that 
N = 400,000 has the desired property. 


§2.3 pages 34-38 


1 Aio = 3.14, Ago = 3.135. Further subdi- 
vision would at worst cause all uncertain squares 
to be omitted. This guarantees that 410 is within 
.38 (less than one decimal place) and that Ago 
is within .195 (still not one decimal place) of r. 
Consulting an accurate value of r (see §7.5) it is 
seen that 410 is correct to 2 decimal places and 
Ago is correct to almost 2 places. In general 
U, = 2n — 1 because the circle crosses n — 1 
lines x = const. and n — 1 lines y = const. in 
going from (n,0) to (0,2). To guarantee 
2n-2U, < .005 one would therefore need 
n > 800. This is much too large because all un- 
certain Squares are not omitted but, rather, only 
about half of them, which is allowed for in the 
formula for Ap. 


2 (a) Ns = 81, Ny = 21. (b) Definition 
of approximatingsum. (c) Inthe first quadrant 
the circle crosses at most r lines x = const. and 
at most r lines y = const., hence there are at 
most 2r + 1 squares on the boundary in the 


Answers to Exercises | pages 48-49 471 


first quadrant or at most 8r + 4inall. (d) In 
refining only the squares in (c) are uncertain. 


3 There are 8n — 4 squares on the boundary. 
If each is enlarged 6 in all 4 directions their total 
area is (8n — 4)(n—! + 26)”. If S is a subdivision 
with mesh size [S| < 6 then U(S) = Uo2(S) < 
(8n — 4)(n—! + 26)?. Let 6 =n7!. Then 
U(S) < (9(8n — 4)/n?). Given e > 0 choose n 
so large that 9(8n — 4)n—2 < e. Then U(S) < € 
whenever |S| < nT} as was to be shown. 


4 Immediate 


5 |L — Vi@| < |L- Eal + |È) - 
E| < e + U(S). 


6 For the first reduction take a third rectangle 
containing both given rectangles. For the second 
note that if Ap 1s zero outside D then it is zero 
outside R’. In the last part it suffices to note that 
if $ (œ) is an approximating sum to fr then 
there is an approximating sum >>’(@’) to fr 
which differs from ` (œ) by at most M: lal- 
(perimeter of R’) where |æ] is the mesh size and 
M is a bound on A. Thus (œ) > L as 
la'l > 0 implies $ (œ) converges (to L) as 
la| — 0. 


7 ‘Only if? is immediate. To prove ‘if,’ split 
U(S) as directed. The first part is at most 
ao: area (R). Set o = ¢/[2 area (R)]. Let M be 
a bound for A on R. Choose 6 so small that 
s(S, o) < €/(2M) whenever |S] < 6. Then 
U(S) < o : area R + 2M: s(S, o) < €/2 + 
6/2 = e. 


8 The denial of the Cauchy Criterion is: There 
is ane > Osuch that for every 6 > 0 there exist 
approximating sums (œ), (œ) such that 
la| < 6, |a’| < 6, and such that |>°(@) — > (@’)| 
> €. 


9 Approximating sums are formed as in the 
case of rectangular subdivisions except that area 
(Rij) = (xi — xi~1)(¥; — y;—1) must be re- 
placed by the formula for the area of a parallelo- 
gram (§1.3). The proof of convergence is exactly 
the same as in the previous case. The proof that 
U2(S) — 0 as |S] — 0 can be made rigorous by 
giving an explicit bound on the number of 
parallelograms which lie on the boundary of a 
rectangle. 


10 Polygonal regions are used because only 
for them can ‘area’ be defined algebraically. 
Since arbitrary sums converge (by the same 
argument as before) and since rectangular sums 
converge to fp A dx dy, it follows that the limit- 
ing value of arbitrary sums is fp A dx dy. 


§2.4 pages 43-44 


1 (a) uv? cos? uv du dv 
(b) —(u+v)u* cos v dudo (c) —16e" du dv 


2 (3/2)t!/?. The integral is thus f$ (3/Dt!”? dt. 
3 The line through (u, v, 0) and (0, 0, 1) can be 
parameterized x = tu, y = tv, z = 1 — t. Set 
x? + y? + z? = 1 and solve for t. The pull- 
back is —4(u? + v? + 1)-? du dv. 

4 cosy dé do. 


5 Set x = cos, y = sin so that xdy — 
y dx = cos? 6 dð + sin? 6 d0 = dé and the in- 
tegral is f" dé. 


6 dxdy = rdrdé. The area is approximately 
2rr times Ar where r is the radius of the ring and 
Ar its width. If r < 0 the map (r, 0) — (x, y) re- 
verses orientations. 


§2.5 pages 48-49 


1 x>u, y=v, z=1—u-—v oriented 
du dv. 

2 Spherical coordinates: 

dy dz = cos 6 cos? v dé dy, 

dz dx = sin 6 cos? ¢ dé dy, 

dx dy = sing cos gdédy. Projection on xy- 
plane. To avoid confusion let (u, v) denote co- 
ordinates on the plane, (x, y, z) coordinates on 
space. Then dy dz = +u(1 — u? — v?) t? dudo, 
dzdx = +vu(1 — u? — v?) t? du dv, dx dy = 
dudv. Stereographic projection: dy dz = 
Bulu? + v? + 1)73 dv du, dz dx = 8v(u? + 
v? + 1)~3 du du, dx dy = 4(u? + v? — 1) X 
(u? + v? + 1)-3 dv du. 


3 fe f(x) dx = fL f(x(u))(dx/du) du if the 
map u — x preserves the orientation of the in- 
terval, i.e. carries œ to a and ß to b. If it re- 
verses the orientation then there is a minus sign. 
In either case the formula [79 f(x) dx = 
fE f (x(u))(dx/du) du holds. Thus f x" dx = 
ABeE ent» e” du = fire e(mt lu du. 


4 At (0, ¢) = (0, 0) the pullbacks are dx = 
— sin 0 cos ¢ d9 — cos 0 sin g dp = 0, dy = dé, 
dz = dọ, x dy dz + y dz dx + z dx dy = dé dọ. 


5 At (u,v) = (0,0) the x and y coordinates 
are 0 so only the term z dx dy need be computed. 
Since dx = 2du/ + u? + v?) + 0 = 2du 
and dy = 2 dv the answer is z dx dy = —4 du dv. 


6 The top end is parameterized by x = u, 
y =v,z = lon {u? + v? < 1} oriented du dv, 
the bottom end by x = u, y = v, z = —1 on 
fu? + v? < 1} oriented dv du, and the sleeve 


Answers to Exercises | page 51 472 


by x = cos 0, y = sin 6, z = hon {0 < 0 < 2r, 
—1 < h < 1} oriented dé dh. 

7 x = (2+ cosu)cosv, y = (2 + cos u) sin v, 
z = sinuon {0 <u < 27,0 < v < 2r} oriented 
dv du. 


§2.6 page 51 


1 (i) is part of the definition of fr A dx dy. 
Since A is assumed to be continuous the inte- 
grals in (11) all converge and the desired equation 
follows from the fact that arbitrarily fine ap- 
proximating sums to fr,+r,. A dx dy split into 
the sum of an approximating sum to fr, A dx dy 
and one to fr, A dx dy. (iii) and (iv) follow from 
the fact that approximating sums to the two 
sides are identical. (v) is easily proved using the 
continuity of A and the fact that if m < 
A(x, y) < M throughout R then m area (R) < 
fr Adx dy < M area (R), which in turn fol- 
lows from the fact that any approximating sum 
to fr A dx dy satisfies these inequalities. 

2 Let >@) = Lf? A(x, Yi) dx] Ay;. Choose 
a subdivision of {a < x < b} into intervals 
shorter than 6, and choose points x; in each 
interval. By Exercise 5, §2.3, it follows both that 
f? A(x, Yi) dx differs by less than e(b — a) from 
$; A(X;, Yi) Ax; and that Jr A dx dy differs 
by less than «(b — a)(d — c) from 


> > A(X; Yi) Ax; Ayi. 
Thus > (æ) differs by less than 2e(b — a)(d — c) 


from fr Adxdy whenever |a| < 6, hence 
> (a) > fr A dx dy as |a| — 0, q.e.d. 


$3.1 pages 56—58 


1 F is the potential function so that F'(t) dt 
gives the amount of work required for small 
(infinitesimal) displacements (see §8.8). The 
total amount of work required is expressed by 
(1) as a sum of small amounts. _ 


2 F(b) — F(a) is the total flow out of the in- 
terval {a < x < b} across the ends a and b. 
Thus F’(t) dt gives the amount of fluid emanat- 
ing from small (infinitesimal) intervals. This is 
called the divergence of the flow F(t). Equa- 
tion (1) gives the total divergence as a sum of 
small divergences. 

3 F'(Ð is velocity, F’(1) dt gives the displace- 
ment which occurs during small (infinitesimal) 
time intervals, and equation (1) says that the 
total displacement can be written as a sum of 
small displacements. 


4 Since F(t + At) — F(t) is the area under the 


curve which lies over the interval (t, t + At), 
it is approximately equal to f(t) At. As At > 0 
the approximation improves and F’(t) = f (À. 
Hence F’(t) dt = f(t) dt gives the amount of 
area lying over small (infinitesimal) intervals of 
the ¢-axis and (1) says that the total area can be 
expressed as a sum of small areas. 
5 F(n) — F(n — 1) = C sin (nA + D) where 
= —2 sin (A/2) and D = B — (A/2). Thus 
sin D + sin (4A + D) +: + sin (nA + D) = 
[F(n) — F(—1)]/C which gives 


sin 8 + sin (a + 8) +--+ + sin (na + B) 


_ cos (na + B + (a/2)) — cos (8 — (@/2)) | 
7 —2 sin (a/2) 
g Sna + B + (a/2)) — sin @ — (@/2)) 
2 sin (a/2) 
7 F(n) — F(n — 1) = (r — 1)r”—t1. Therefore 
F(n) — FO) = (r — DU +r +r? +: +7774] 


and the desired sum is [F(n) — F(0)]/(r — 1) = 


(r” — 1)/(r — 1). 


8 n? = F(n) — FO) = [1 + 


(Qn —1)) = —n+2[1+24+34:-- 
the desired sum is $(n? + A 
9 n? = F(n) — FO) = [1+7+194+-°-+ 


Ga — 3n + D] = n= 3: (n? + n) + 301 + 
22 + 32 + --- + n?] so the desi ired sum is 
ant t3 5+ (a? + n)] = gn? + gn? + 
= (2n? + 3n? + n)/6 = (na + 1) X 
in + 1)n/6). 
10 nt = 401 +23 +: + n3] 6l + 2? + 
-+a 2 H+ an — If + 
20 + +++ + n°] which gives, when the formulas 
of Exercises 8, 9 are used, the formula 1 + 
23 +--+ + n3 = n?(n + 1)?/4. 
11 Asin 10 one obtains n*+t! = (k + DU + 
2k H + n*] — HU + 2k—1 pe 
n71] + +++. By induction, the sums on the 
right can be written, with the exception of the 
first, as polynomials in n of degree < k and the 
desired conclusion follows. 
12 The formula of 5 gives È sin (j/n)N~! = 
(cos B — cos A)/(—2N sin (2N—!)) where B is 
near b and A is near a. As N — œ the limit is 
— [cos b — cos a]. The others are similar. 
13 By independence of parameter F(b) — 
F(a) = fro dy = f? (dy/dx) dx = fè F'(x) dx. 


$3.2 pages 62-65 


1 (a) A constant downward flow. No diver- 
gence. (b) A downward flow in the upper half 


Answers to Exercises | pages 70-72 473 


plane and an upward flow in the lower half 
plane. Flow is converging at all points. (c) A 
downward flow in the right half plane, upward 
in the left. A shear with no divergence. (d) A 
flow from left to right at all points, converging 
in the left half plane, diverging in the right. 
(e) A vortical flow, no divergence. (f) A radial 
flow diverging at all points. 

2 The divergence is identically zero as is easily 
checked. Cutting a small hole {x2 + y? < eĉ} 
out of a domain containing the origin and apply- 
ing the divergence theorem to the remaining 
domain shows that the flow across the boundary 
of the original domain is equal to the flow across 
a small circle {x? + y? = e°}. This flow is 
easily seen, by parameterizing the circle 
x = €cos6, y = esin l, to be 2r. 

3 —dF is the 1-form which describes the 
central force field. 


4 If Sis the line segment {a < x < b,y = cy 
and if G(x) = F(x, c) then G’(x) = (0F/dx)(x, 
c) and fs ((@F/dx) dx + (0F/dy) dy) = 
fè G(x) dx = G(b) — G@ = F(b, c) — F(a, c) 
as was to be shown. 

5 The integral is $(x + yô) in all three cases. 
The flow has no divergence so the total flow 
across any barrier from (0, 0) to (xo, yo) is the 
same as the total flow across any other. 


6 If PoP, P2P3 are the vertices of the rectangle 
then the integrals over the sides are F(P1) — 
F(Po), F(P2) — F(P1), F(P3) — F(P2), F(Po) — 
F(P3) for a total of zero. Thus by (2) the integral 
of the 2-form 


[(0/dx)(OF/dy) — (0/dy)(OF/dx)] dx dy 


over any rectangle is zero. By (v) of §2.6 this 
implies that [(0/dx)(0F/dy) — (0/dy)(OF/0dx)] 
is identically zero. 

7 Fy, = Fo because their difference is the in- 
tegral of A dx + B dy around the boundary of 
a rectangle which, by (2), is zero. It is 
easily shown that (OF,/dy) = B and that 
(OF 2/0x) = Á. 

8 (a)Exact.F = 4(x+ y)’. (b) Exact. F = xy. 
(c) Exact. F = (x? + y?). (d) Exact. F = 
e’ cos y. (e) Exact. F = x log (xy). (f) Exact. 
F = dlog (x? + y*). (g) Closed but not exact. 
9 The first statement is proved by the method 
of Exercise 6. No, the force field described by 
the 1-form 8(g) has this property but is not con- 
servative. Prove the last statement by the method 
of Exercise 7. 


10 dx =cosédr—rsin@d6, dy = sin 0 dr + 


r cos 0 d6, x dy = r cos 0 sin 0 dr + r°? cos? 6 dé, 
x dy — y dx = r°? db, (x dx + ydy)/(x? + y?) = 
dr/r, (x dy — y dx)/(x” + y?) = dé. The first 
two are exact, the next two are not closed, the 
fifth is exact, and the sixth is closed and not 
exact. The same is true of the pullbacks except 
that the pullback of the last one is exact. 

11 fap A dx + B dy = f [(A cos 6 + 
B sin 8) dr + (—rA sin 6 + rB cos 6) dé]. This 
can be considered as an integral around the 
boundary of the rectangle {0 <r < 1, 0 < 
0 < 2r}. Applying formula (2) and simplifying 
gives f((8B/ðx) — (3A/ðy))r dr dð over the rec- 
tangle, which is (by definition) fp ((0B/dx) — 
(94/ðy)) dx dy. 

12 Let R be the region bounded by the curve S, 
the lines x = a, x = b, and the x-axis oriented 
so that S is part of OR. Then fs y dx = far y dx 
because dx is zero on x = const. and because y 
is zero on the x-axis. Thus fs y dx = fr dy dx = 
area of R if S lies above the axis and = —area 
of R if S lies below the axis. Of course fs y dx = 
fo fd) dx. 


130 3(%i-1yi — Xi-1) 


1] /xi + xi-1 
= (=) (yi — yi-1) 
_ (==) On “| 


= a[x (Ay) — y (Ax)]. 
14 By 13,7 = fap s(x dy — y dx) where D is 
the disk {x? + y? < 1} oriented dx dy. The pa- 
rameterization of the circle by x = 2u/(u? + 1), 
y = (u? — 1)/(u? + 1) converts this integral to 
the given integral (see §7.5). 


$3.3 pages 70-72 


1 This is flow from a source at (0, 0, 0), hence 
the divergence should be zero. 

2 (a) 4r/3 (b)4r/3 (c)4r? 

3 The force is dydz, hence its total is 
fap dy dz = fo d(dy dz) = fp0 = Q. 

4 d(Adx + Bdy) = d{[A((0x/dr) dr 

+ (0x/00) dé) + B((dy/dr) dr + (8y/30) dé)] 
= [(0/dr)(A(0x/00) + B(Oy/08)) 

— (0/00)(A(0x/dr) + B(Oy/dr))\ dr dé 

= [(0A/dr)(0x/00) + (0B/dr)(0y/08) 

— (0A/00)(0x/dr) — (0B/d6)(Oy/dr)] dr dé. 

dA dx + dB dy = ((0A/dr) dr + (04/30) d8) 
x ((0x/dr) dr + (0x/06) d0) + ((0B/dr) dr 

+ (0B/06) d0)((dy/dr) dr + (ðy/30) dð) 

= [(0A/dr)(0x/086) + ++: dr dé. 


Answers to Exercises | pages 74-75 474 


5 (0H/dt) = k dE, i.e. (0H;/dt) = k 

X (E3/dy) — (E2/dt)), OH2/dt) = k 

X ((E1/0z) — (E3/dx)), 

(0H3/dt) = k(QE2/dx) — (0E1/dy)). 

6 Radial lines emanating from the origin, of 
equal density in all directions. The derived 
2-form is the 3-form which assigns to each por- 
tion of space the number of lines of force which 
end in it. [The number which enter minus the 
number which leave equal the number which 
terminate inside minus the number which 
originate inside.] dH = 0 means that magnetic 
lines of force never terminate. Electrical lines of 
force terminate on charges. 

7 (a) isa simple computation. (b) depends 
on the observation that a boundary has no 
boundary. 


8 Same as 7 


9 The 1-form (xdy — ydx)/(x* + y?) is 
closed (where defined, i.e. except on the z-axis) 
but not exact. The 2-form of Exercise 1 is closed 
but not exact. 


10 IfP = (xp, YP; Zp), Q = (xQ, YQ, zQ) then 
the value of dydz on the triangle OPQ is 
3(ypZg — yozp). If P, Q lie on a line x = const., 
z = const. this is 4z(yp — yg) = —4z dy. 
Similarly if P, Q lie on a line x = const., 
y = const. the value is 4y dz. Thus the proposed 
1-form is $(y dz — z dy). 


§3.4 pages 74-75 


1 (@jddw=0 (b)dw = 3 dxdydz (c)dw 
= e%™z[yz dx + zx dy + xy dz] (d) dw = 
sin x dy dx + cos xdxdz (e)dw = 2(x + y) X 
[dx dy + dx dz + dy dz] (f) dw = (dx/x) 
(g)dw = cosxdx (h)dw = 2xdx (i)dw = 
dx - 

2 The equation fər x dy dz = fr dx dy dz dic- 
tates that the face x = const. with the larger 
x-coordinate must be oriented by dy dz. 


3 See Exercises 7 and 8 of §3.3. fr d (dw) = 
far dw = fanow =0 for all R, hence 
d (dw) = 0. 

4 Simple differentiation gives (0F/0x) = 
(2y(x* + 4x?y? — y4)/(x? + y?)*). This is 
continuous by the same argument which proves 
that F is continuous. On the line x = 0 it is 
—2y, hence (0/dy)(OF/0x) exists at (0, 0) and 
is —2. In the same way (0/0x)(OF/dy) exists at 
(0,0) and is 2. The second partials have not 
been proved to be continuous, so equality of 
second partials is not contradicted. 


5 The function must be constant because 
F(Q) — F(P) = f dF = 0. If the domain con- 
sists of two pieces then F can have one constant 
value on one piece and another on the other so 
that dF = 0 without F = const. 

6 The method of Exercise 7, §3.2 proves that 
w = dF for some function F. The 1-form 
w = (xdy — y dx)/(x? + y’) is the usual 
counterexample. See $8.6 for a continuation of 
this topic. 

7 Straightforward 

8 fsd(Adx + Bdy) = fs ((0B/dx) — 
(9A/dy)) dx dy = fè fi~ (@B/ðx) dx dy — 
So Jo~* (@A/dy) dy dx = fà BU — y, y) dy — 
fa BO, y) dx + fd A(x, 0) dx — fi Als, 
1 — x) dx = fas (A dx + B dy). 

9 The pullback of dw is d of the pullback of 
w; hence by Exercise 8 the integral of the pull- 
back of dw is the integral of the pullback of w 
around the boundary. 


10 Subdivide the polygonal surface into tri- 
angles, apply Exercise 9 to each triangle, and 
use cancellation on interior boundaries. The 
proof is necessarily sketchy because the notion 
of ‘polygonal surface’ has not been made 
precise. 


$4.1 pages 84-86 


1 (ax = -u4+2v4+7 
y = 2u — 3v — 11 


(r = 2) 
(b) t=y-—-2 

x = 2y — 3 

z= 4y — 11 


(r = 1 and any of the three variables x, y, z 
can be moved to the right side.) 

(c) x = V—2y+2z2-—- 7-4 
(r = 1 and any of the four variables x, y, Z, t 
can be moved to the left side.) 

(d) y = 4v + ĝu +3 

z = 4v — łu + 2x + ź 

(r = 2 and (x, y) can be moved to the left, but 
(x, z) cannot.) 


— 1 1 5 

(e) p = —3c¢ + 3a — 3 
— 3 1 7 
q= 9C— gat 3 
b = —3c + 3a — 12 


(r = 2 and any two of the three variables a, b, 
c can be moved to the right.) 
(f) x = Su — xv + 32 - 4 
y= u+ tv + 4t — $ 
(r = 2. The pairs (x, ®, (y, z), (z, £) can be 
moved left, but the pairs (x, z), (y, cannot.) 


Answers to Exercises | pages 103-105 475 


(g)p = 2x— Sy— z+ 16 
q= 9x — 23y — 424+ 7] 


r —4x + lly + 2z — 32 
(r = 3) 
(hy x = gv + sw — 2z 
y= -ýt tyw- z 
u = — 73e + 3V 


(r = 2. Any pair on the right can be moved left 
and vice versa.) 


2 x= —u—v+3 
y = —4u — 3v + 11 
3 x= 4u—6v+4 
y = 4u — 8v4 4 
z = 6u — 9v — 3 
t = 2u — 4v + 4 


The a;;, b; can be chosen in 6 ways, depending 
on which of the points (0,0), (1,0), (1, 1) is 
assigned to each of the vertices of the triangle. 
dx dy = —8 du dv, dx dz = 0, dx dt = —4 du dv, 
dy dz = 12 du dv, dy dt = 0, dz dt = —6 du dv. 
Thus uv can be eliminated using any pair but 
(x, z) or (y, t). This is evidenced by the fact 
that the triangle projects to a line in the xz- 
plane and in the yr-plane. 


4 x= —3u+ v+ w4+2 
y= —3u— v+ 3w +4 
z= 3u— 20+ w+1 
t= 3u— 3v + 4w-—l 


This can be done in 24 ways, depending on the 
order of the vertices. dx dy dz = 6 du dv dw, 
dx dy dt = 18 du dv dw, dx dz dt = 3 du dv dw, 
dy dz dt = 15 du dv dw. Any triple. 


5 Ifaiia22 — ai2a21 = 0 then asou — aiU 
is constant for fixed z, hence not all values of 
(u, v) correspond to values of (x, y). 


6 u = (a22/D)x — (ai2/D)y 
+(aıi2b2 — a22b1/D) 


v = —(a21/D)x + (a11/D)y 
+ (a2ı1b1 — a11b2/D) 
z = (a31@22 — a32a21)/D)x 


+ (a32a11 — a31a12/D)y 

+ (a32a21 — a31a22/D)b1 

+ (a31@12 — a32a11/D)b2 + b3 
Where D = a)1@22 — aj2Q2). The condition 
D #Æ 0 is necessary and sufficient for there to 
be a solution. 


7 Ifand only if a,1a22 — a12a21, a11a23 — 
a13€21, and aj2a23 — a13a22 are all zero. 
Geometrically this means that the map col- 
lapses xyz-space (the domain) to a line (the 
image) in the uv-plane (the range). The level 
surfaces are planes. 


8 If and only if a11@22 — @12421, 411432 — 
412431, and 421432 — a?22431 are all zero. 
Then the map collapses the uv-plane (the do- 
main) to a line (the image) in xyz-space (the 
range). The level surfaces are lines. 


9 dpdr = 22dpdq, dqdr = —Ż dp dq. This 
follows immediately from dr = 2 dp + 2% dą. 


§4.2 pages 93-94 


1 (a) 7 du dv dw dx — 2 du dw dy dz plus 
thirteen terms with the coefficient zero. 
(b) —3 du dv dw dy — 3 du dv dw dz plus thirteen 
zero terms. (c) 2 du dv dx dy plus fourteen 
zero terms. (d)4 du dv dw dy — du dv dx dz — 
4 du dx dy dz + dv dw dx dz — dv dx dy dz — 
4 dw dx dy dz (plus 9 zero terms). 


2, 3,4 Self-checking 


5 There are 24 orders; half of them represent 
dx dy dz dw and half —dx dy dz dw, e.g. 
dx dy dz dw = dy dz dx dw = dz dx dy dw = 
dx dw dy dz = dw dy dx dz = ---, —dx dy dz dw 
= dy dx dz dw = dx dz dydw = --- 


$4.3 pages 103-105 


om- G 2) 


—7 5 — 
7 3 5 12 
MaM, = (45 —4 4) 
_—/15 1 31 
MoMi = \ 5 -2 12) 


Answers to Exercises | pages 103-105 476 


40 7 9 (e) Self-checking. For example, 
M, My =| —5 MaMe = (4 _s) 
45 


55 —301 —115 
(MM)? =( 55 —301 —115 
—110 602 230 

ERE = MẸ' MP. 


2 Interchanging the first two rows of a 3 X 3 
6 — i) matrix is the same as multiplying on the left by 


( 0 1 0 
(i 0 0) 
N 0 0 1 
( 1 0) (3 b 4 k e c) 
1 0 Oj{id e fl=ta b ec}: 
1 4 21 9 . . 
Mam = ( s ) mm = ( 00 1/\g h i g hi 


Since the determinant of his matrix is —1 (it 
19 4 4 corresponds to x =v, y=u, z= w hence 
10 5 2s) dx dy dz = —du dv dw) the rule (a) follows for 


—15 —19 —49 this case. Multiplying on the right by this matrix 
interchanges the first two columns. All cases of 


= 
ES 
Hi 


—46 19 19 
3 —1 —1 
— 53 A 22 


rule (a) can be proved in this way. The operation 
in (b) is the matrix form of the operation of 
composition with a map like x = u + av, 
D a —1 y =v, z = w for which dx dy dz = du dv dw. 
(d) Me My = 1 The operation in (c) is composition with a map 
7 like x = cu, y = v,z = w for which dx dy dz = 
4) c du dv dw. If the first row is (1, 0, 0) then using 
2 —4 (b) the first column can be made to be (1, 0, 0) 

Mp = ( ' 5) 

—1 2 


MP 


ll 
~ 
N 


without changing the other columns and with- 
out changing the determinant. The matrix then 
corresponds to an affine map of the form x = u, 
y = av + bw, z = a'v + b'w. In computing 
) dx dy dz = D du dv dw it suffices to find dy dz = 
D dv dw and multiply by dx = du. The de- 
| terminants are —42, 5, 45, 1, and —2242. 


Me 


(2 
MP =| 0 
5 


M® =(-5 2 3 M®= 


ts 


3 A rotation of 90° in a coordinate plane 
corresponds to the operation of interchanging 
two rows (or columns) and changing the sign 
10 of one of them. The corresponding matrix is a 

0 matrix which can be obtained from the identity 

MU = ( 5 2) matrix (1’s in the ith row of the ith column for 

5 9 j = 1,2,...,n and 0’s elsewhere) by applying 
3 1 1 such an operation. A shear corresponds to the 

20 operation of adding a multiple of one row (or 
—? 1I s) column) to another row (or column) and the 


MY = 


( M® = (—1) corresponding matrix is a matrix which can be 
obtained from the identity matrix by applying 
such an operation. A scale factor corresponds 

Mo = ¢ ; i to the operation of multiplying all entries of 

to” — one row (or one column) by a given factor and 
the corresponding matrix is a matrix which can 

—11 —5 13 be obtained from the identity matrix by such 

( M£ = (0) an operation. By composing with a translation 

it can be assumed that the constants in the given 


—11 —5 13 
22 10 —26 


Answers to Exercises | pages 124-127 477 


affine map are zero. By composing with rotations 
it can be assumed that the entry in the upper 
left hand corner of the matrix of coefficients is 
not zero (unless it is identically zero, in which 
case it is a scale factor of 0 in all coordinate 
directions). Applying a scale factor it can be 
assumed that it has a 1 in the upper left hand 
corner. Composing with shears it can be as- 
sumed that all other entries in the first row and 
column of its matrix of coefficients are 0. Then 
use induction. Very simply, the statement is that 
every determinant can be evaluated by the 
method of Exercise 2, a fact of which one is 
easily convinced by a few examples. 


4 By 3 every n X n matrix can be written as a 
composition (product) of matrices of three 
simple types (rotations, shears, scale factors). 
For each of these types the determinant of the 
transpose is immediately seen to be the de- 
terminant of the matrix itself. The transpose of a 
product is the product of the transposes in re- 
verse order, and the determinant of a product is 
the product of the determinants. Hence if M = 
MiM2:--M, then MY = MM <- MPM”, 
det (M) = det (M®P)--- det (M$”) 
det (MS?) = det (Mı) det (Mə): det 
(M,,) = det (M) as desired. 


84.4 page 7712 


— 5 2 4 — 1 5 7 
1 X = 3U — 30 — ZW, Y = “gu — 30 — BW, 


z= —4u + 2v + 3w 

2 Setx = 3u — 40+ 4w+ bi, y = 44u — 
šv — Sw + bo,z = —4u + 2v + 3w + b3. The 
point (x,y,z) = (0,0,0) corresponds to 
(u,v, w) = (7, —2, 1). Substituting in the first 
equation gives 0 = 37 — $(—2) — 4-1+ bı, 
hence 6; = —35/3. Similarly bə = —80/3, 
b3 = 29, 

3 (a) Reduce the system to the form (2). Since 
r<m = 2 < n the equations (2a) contain at 
least one of the variables (x, y, z) on the right 
side. Choosing two different values of this vari- 
able (or these variables) and fixing values of 
(u, v), the equations (2a) give two points on the 
same level surface. (b) Reduce the system to 
the form (2). Since r < n = 2 < m the equa- 
tions (2b) contain at least one equation. Fixing 
the values of (u, v, w) on the right side of (2b) 
gives values of those on the left. Choosing differ- 
ent values for those on the left gives points 
(u, v, w) not in the image. 


4 Asin3, ifn > m then equations (2a) imply 
that the map is not one-to-one, and if n < m 


then the equations (2b) imply that the map is not 
onto. 


5 (a) The ‘only if half, because 1 = 
det (MMT?) = det (M) det (M—!) implies, if 
det (M~') is an integer, that det (M) = +1. 
Hence M~! has integer entries only if 
det (M) = +1. (b) The formula for M7! 
gives its entries as integers divided by det (M), 
hence M~! has integer entries if det (M) = +1. 


$4.5 pages 124-127 


1 (a) d, —1, 0), ©, 1, —1), dim = 2 

(b) (1, 1, 1), (—1, 0, 1), dim = 2 

(c) (1, 1, 1), dim = 1 

(d) (1, 3, 5), dim = 1 

(e) dim = 0 

2 (a) (1, —1, 0,0, 0,0), (0,1, —1,0,0,0),..., 
(0,0,0,0,1,-—1),dim=5 (b)d,1,1,1,1, 1), 
(0, 1, 2, 3, 4,5), dim = 2 


3 (a) Given an arrow A and an arrow B, place 
the beginning point of B at the ending point of A 
and take A + B to be the arrow from the be- 
ginning point of A to the ending point of B. 
(b) Multiply the length of the arrow by the 
number, leaving the direction unchanged; if the 
number is negative, reverse the direction of the 
arrow. (c) The dimension is 2 and a basis is 
any two non-collinear arrows. (d) Displace- 
ments are described by arrows (directed line seg- 
ments). The sum in (a) is the composition of two 
displacements. Forces are described by arrows. 
The sum of (a) tells how two forces give a re- 
sultant force. Velocities are described by arrows. 
The sum of (a) tells how the velocities of two 
motions (an airplane in the wind) add to give a 
resultant velocity. 


4 Three arrows in space are linearly dependent 
if and only if they are coplanar. 


5 1-(-v) = (1-1) = 1-v. Thus 1-v and 
v satisfy the same equation and are therefore 
equal. This proves V. Similarly, [v + 0- w] + 
w=v+[(0-'w+tl1-w]) =v+O0+4+1)w= 
v + w. Thus v + 0: w and v satisfy the same 
equation and are therefore equal. This proves VI. 
Using V and VI it is easily seen that (1/aı) X 
[v — a202 — *** — AnUn] satisfies the given 
equation, thus it is the solution guaranteed 
by IV. 


6 By IV, if such a vector exists it is unique. If 
w is any vector then 0 - w has the property which 
defines the zero vector (by IJ and VI). Hence the 
zero vector exists and is unique. If fis any func- 


Answers to Exercises | pages 124—127 478 


tion then 0 - fis the zero vector and is the func- 
tion whose values are all zero. 

7 If fv) = 0, ftw) = 0 then fv + w) = 
fv) +f) = 0+ 0 = 0, hence v + w is in 
the kernel. Also f(av) = af(v) = a:0 = 0 so 
av is in the kernel, hence the kernel is a sub- 
space. The subspace of 1(a) is the kernel of the 
linear map (f1, f2, f3) > fi + fe + fs of V3 to 
V1. Similarly the subspaces of 1(b) — (e) are 
kernels. 


8 It shows that the solution vı asserted by IV 
lies in the subspace provided that v, v2, v3,..., Un 
do; hence the elements of the subspace satisfy IV. 
I-III are immediate. 


9 (a) See (b). (b) If S is any set and if W 
is any vector space then the rule (f + g)(x) = 
f(x) + g(x) defines an addition operation on 
the set of functions {f : S — W} and (af)(x) = 
al f(x)] defines an operation of multiplication 
by numbers. Axioms I-IV are satisfied; thus 
{f:S — W} is a vector space. If S = Visa 
vector space then the set of all linear maps 
{f: V — W; isa subspace of the set of all maps 
(the sum of linear maps is linear, and any multi- 
ple of a linear map is linear). Thus the space of 
linear maps Hom(V, W) is itself a vector space. 
(c) Let ô; be the element of V,, which is 1 oni 
and 0 on all other integers 1, 2, 3,..., n. Sim- 
ilarly let e; be the element of V,,, which is 1 on i 
and 0 on other integers. Then €1, €2,..., Em 
are a basis of V,, and 61, 62,..., 6, a basis 
of V,. Given a linear map f: V — W, define 
numbers ai; by (6) = Doj=1 ayei Then 
f (xı, x2, -ees Xn) 


f(x + x282 + ° + Xnôn) 

x1f (61) + x2f (2) + °° © + Xnf (ôn) 
X1 ie Qi ttt + Xn? i=l Ain€i 
yier H'et F YmEm 

(Y1, Y2,- - -3 Ym) 


How Ww we 


where 
n 
Yi = Xidi Htt F Xnain = 2 j=1 AX; 


etc. Addition of elements of Hom(V,, Wm) is the 
operation of adding matrices by adding corre- 
sponding components. Multiplication by num- 
bers in Hom(V,, Vm) is the operation of multi- 
plying all entries of a matrix by a given number. 
(d) dim = nm. A basis is given by the matrices 
which are 1 in one position and 0 in all others. 


10 (a) An element ¢ of W* is a linear map 
ġ: W > V1. 1ff: V — Wis linear then the com- 
position @ o f: V — V;, is an element of V’*, de- 
noted f*(¢). This map f*: W* — V* is linear. 


(b) dim (V*) = dim (V). (c) Given a basis 
U1, V2,...,Un Of V, define ġi: V — Vj, by 
hi(xyw1 + x2ve + tt + XnVn) = xı. Defin- 
ing $2,63,...,@n analogously gives a basis 
of V*., 


11 (a) Elements of (V,,)* are row matrices, i.e. 
1 X nmatrices. (b) By the transposed matrix. 


12 Ifr < n then the zero vector must be the 
image of a non-zero vector (v,), hence the first 
alternative does not hold. Any element of V,, 
which is in the image is the image of more than 
one element (add any multiple of v,) hence the 
second alternative holds. If r = n then the map 
is One-to-one and onto, i.e. every element of V, 
is the image of just one element of V, under the 
given map; hence the first alternative holds. 


1-1 -1 
13 Gd 1 D){0 1 O}J=d O O). 


0 0 1 
( i 0) 
a -2 plo 1 o)=a 0 Q. 
0 0 1 

1 0 0 4 —3 1 0 0 1 

T 1 1 
CEECEE 49) 

9 9 9 4 


1 0 0 
={0 1 0J: 
0 0 1 


Of course there are many solutions, for example 


4-3 1\/ #& #& &\ /1 0 0 
1 1-3 -%4 á #)=(0 1 0} 
2 1 1/\- -4 %/ \0 01 


q 1 1 1 1 1 


Na” 


O Om oo 


1 0 0 1-1 A/A 0 1 
0 0 1 0 1—1ijļj{0 0 1 
1 1 1/\—1 0 T1/\1 1 


1 
1 0 0 
-(0 1 0 
0 0 0 
1 
C DEI- 1 | 
š =5/ M1 -2 V\y 0 ił 
-2A 
=lo 1 0 


Answers to Exercises | pages 140-142 479 


1-2 1 0 0 0 

0 1-2 1 0 0 

0 0 1-2 1 0 

0 0 0 1-2 1 

12 3 4 5-4 

0 12 3 4-3 

<|0 0 1 2 3-2 

0 0 0 1 2-1 

00 001 0 

000001 
100 0 0 0 
|o 1.0 0 0 0 
“jo 0 1 0 0 0 
00 0 1 0 0 


Set v; = QO(6;) = ith column of Q, and set 
wi = P—1(6;) = ith column of Pt. Since Q 
and P~! are invertible these are bases. M(v;) = 
P—'PMOQ(6,) = P~'!C,(6;) = wifi < rand = 
Oifi>r. 


84.6 pages 130-131 


1 If the map (1) in the Implicit Function 
Theorem is onto then equations (2b) are absent 
and the equations (2a) give a parameterization 
of each level surface y = const. by x41, 
Xm+2)++-+ Xn. If the map (1) is one-to-one then 
the equations (2b) give m — n equations which 
define the image. 


2 Ifthe pullbacks of all k-forms under M2 are 
zero then the pullbacks of all k-forms under 
Mı = Pv!Me2Qz! are zero (by the Chain 
Rule) and vice versa. This shows that rank 
Mı = rank M2. Since M® = QMC“P™ and 
since Q%, P™ are invertible, it follows that 
rank M® = rank C® = (). 


3 (a) In this case fpofg! is of the form 
foe Aofg! where A:R” — R” is the map 
yi = l — x1 — x2 — tt — Xn, Y2 = X2, 
Y3 = X3,...,Yn = Xn Which interchanges 
(0,0,...,0) (1,0,...,0) and leaves (0, 1, 
0O,...,0),..., (@,...,0,1) fixed. Since 
dy; dyg:++ dyn = —dxı dx2°:+:+ dx, this map 
has Jacobian 


O(1, 2, -- +5 Yn)/O(X1, X2, . -y Xn) = -1 


and the result follows. (b) The Chain Rule 
implies that the composition of two maps with 
negative Jacobian has positive Jacobian. 


(c) (_ cos nos a) for 0 < @ < r/2 gives a 


. , 1 0 01 
continuous deformation of ¢ 1) to (i a) 


To do this for shears, translations, and positive 
scale factors is easy. Writing fg as a product of 
such factors and deforming the factors to the 
identity one by one deforms fg to the identity, 
hence deforms Q to P. (d) Clear (e) The 
given determinant is the Jacobian of the map 


x = xo + (xı — Xo)u 
+ (x2 — xov + (x3 — xo)w 


y = yo + (yı — yoju 
+ (ve — yojo + (y3 — yo)w 
zZz = Zo + (21 — Zou 


+ (z2 — zov + (z3 — Zo)w 


carrying (0, 0, 0), (1, 0, 0), (0, l, 0), (0, 0, 1) to 
the given quadruple, hence its sign tells whether 
it agrees or disagrees with the standard quad- 
ruple. 


4 Let r; be the rank of fi. Then n; — rizi = 
dimension of level surface of fi}ı = dimension 
of image of f; = r;. Hence n; = ri + i41. 
Thus nı — n2 +ng—css + tan = (ri +12) — 
(r2 + r3) + r3 + ra =: rn try) = 
rirni =0+0= 0. 


$5.1. pages 140-142 


1 Any point where 4 ~ 0. Any point where 
&b ~ 0. Explicit solutions are u = +4/y + v?, 
v = +4/—y + u?, with the sign chosen in ac- 
cord with (%,0). The equation y = u? — v? 
can be solved by 


(/y,0)  ify>0 
(u,v) =}\(0,0)  ify=0 
(0,/—y) ify < 0. 


Hence for all y there is a solution. Among other 
things, the level curves y = const. are very 
different near (0,0) in the three cases y > 0, 
y=0, y < 0. 

2 yt2x = (u+v)}?, y — 2x = (u — v)’; 
hence u = 3[4+\/y + 2x + Vy — 2x], v = 
[+y + 2x F yy — 2x]. Points of the image 
must satisfy y + 2x > 0, y — 2x > 0. The 
lines u + v = 0,u — v = Odivide the uv-plane 
into 4 parts, each of which is mapped one-to-one 
onto the wedge y > —2x, y > 2x. The signs 
under the radical signs in (6’) are necessarily 
opposite, which leaves 8 possibilities. One choice 
determines the sign of u, one the sign of v, and 
the sign under the radical sign determines 
whether |u| > |o] or |v| > |u|. Hence each of 
the 8 choices is valid in one of the 8 regions into 
which the uv-plane is divided by the lines u = 0, 
u=O0,u=v,u= —v. 


Answers to Exercises | pages 147-151 480 


3 Solution is possible provided that the line 
= 0, w =W is not tangent to the sphere 
u? + v2 + w? = R? + 0? + W, i.e. provided 
i ~ 0. 
4 This map is non-singular of rank 2 at all 
points other than (u,v) = (0,0). The curves 
x = const. and y = const. are rectangular 
hyperbolae intersecting at right angles. For 
each (x, y) = (0,0) there are two solutions 
(u,v) given by u = &VE(x + Vx? + y?, 
v = y/2u = y2 + Vx? + y. 
5 Non-singular of rank 2 at all points. x? + 
y? = e gives u = 4 log (x? + y?) and sin v/ 
cosu = y/x gives v = arctan ()/x). Giving 
(x, y) ¥ (0, 0) determines u and determines in- 
finitely many values of v, any two of which differ 
by a multiple of 2r. The point (x, y) = (0, 0) 
is not in the image. 


6 The map of 5 is everywhere non-singular of 
rank 2 but not onto ((0, 0) is omitted) or one-to- 
one ((0,0) and (0, 277) have the same image). 
The sign of the derivative of a non-singular map 
f: R — R does not change, which implies that 
the function is increasing or decreasing, hence 
one-to-one. The map y = e is non-singular of 
rank 1 but not onto. 


7 The map t = u + v has as its level surfaces 
the lines u + v = const. The map x = cost, 
y = sint wraps the ¢-line around the circle 
x2 + y2 = 1, 

8 See §7.1. Arcsin x, defined for |x| < 1, is 
the unique number y in the interval {—r/2 < 
y < t/2} such that x = sin y. 

9 Thecurve intersects y = ax + $ if and only 
if x satisfies a cubic equation, hence in 1 or 3 
points, unless x = —1, in which case the equa- 
tion has degree 2 in x. It intersects x + y = C 
in 1 point (where ¢ = 0) if C = 1, in 2 points 
for 0 < C < 1, in 1 point for C = 0, in two 
points for —4 < C < 0 and in no points for 
C < —4orC > 1; these conclusions all follow 
from the fact that z? = a has two solutions if 
a > 0, one if a = 0, none ifa < 0. The curve 
has x + y = —# as an asymptote and makes 
a loop (or ‘leaf? = folium) in the first quadrant. 
Fis singular where 3x? — y = 0, 3y? — x = 0, 
which gives (x, y) = (0, 0) or (x, y) = (3, 3) 
F is negative inside the folium and to the south- 
west of the folium, positive to the northeast. 
The singularity at (4, 4) is a local minimum 
of F. The curves F = const. consist of two 
pieces when the constant is negative, one if 
positive. Solving for y as a function of (F, x) is 


possible provided 3y? — x #0. Assuming 
F = Q this is true provided (x, y) = (0, 0) and 
(x, y) = (44/3, V/2/3). These points cut the 
curve into 4 pieces of the form (x, f(x)). 


10 If dF/dz = 0 then z can be written as a 
function of x, y, and F. Setting F = 0 gives z 
as a function of x and y. 


11 dF #0 by assumption. If dFdG=0 
the map (x, y, z) — (F, G) is non-singular of 
rank 1 and the equation (2b) gives a functional 
relation G — g(F)= 0. Conversely, if dF dG ¥ 0, 
then F, G cannot be functionally related 
because f (F, G) = 0 implies df = (0f/0F) dF + 
(Of/0G) dG is identically zero as a 1-form in 
(x, y, zZ); hence 0 = df dG = (Of/0F) dF dG, 
(Of/OF) = 0 as a function of (x, y, z). Similarly 
(Of/0G) = 0. Since (x, y, z) — (F, G) covers all 
points near (F(X, Y, Z), G(X, F, Z)) this implies 
both partials of f are identically zero near this 
point, hence f= 0 and f(F, G)=0 is not a 
relation between F and G. Thus ‘functionally 
related’ implies dF dG = 0. 


12 f'(x) exists and is equal to 2x sin (1/x?) — 
2x7! cos (1/x”) for x # 0 by the usual rules of 
differentiation. For x = 0 it is the limit of 
h sin (1/h?) as A — 0 which is < |A| in absolute 
value and hence /’(0) = 0. But lim,.9f’(x) 
does not exist so f’(x) is not a continuous 
function. 


13 A map is non-singular of rank r if and only 
if the local rank and the infinitesimal rank are 
both r. It is singular if and only if the infini- 
tesimal rank is strictly less than the local rank. 


§5.2 pages 147-151 


1 (a) ya + xb (© 1, 2, —9, —6, —23 
(d —3a + 2b=0 (© —3(« -2+2 4+ 3) 
=0 yx — X) +x% -7)=0 (At 
(0, 0) this is not the equation of a line. The 
curve xy = 0 is singular at (0,0) and has no 
tangent line. (h) The line has the equation 
(y — Y) = —(9/X)(x — ¥) = —¥-°(x — F), 
hence its slope is —¥7?. (i) ydx + x dy = 0, 
dy = T0/x) dx; hence dy/dx = —y/x = 


—1/x*. 


2 (a) 4xa+ 2yb (b) F > 12 away from 
(0,0) (e) 4x — X) + 29 — Jy) = 0 
(f£) The tangent can also be written (y — Y) = 
(—2x/y¥)(x — X), hence its slope is —2X/Y. 
(g) dz = 4x dx + 2y dy, dy = (1/2y) dz — 
(2x/y) dx; hence when z is constant dy/dx = 
—2x/y = 2x1 — 2x31. 


Answers to Exercises | pages 183-190 481 


3 (a) (3X7 — Px — X) + 87? — I — F) 
= 0. (b)37? — X = 0, which combines with 
x? + 7? — Xp = 0 to give (X, 7) = W/4/3, 
4/2/3). The point (x,y) = (0,0) is a singu- 
larity at which one branch of the curve has a 
vertical tangent. (c) (2/3, ~/4/3) and one 
branch at (0,0) (d) 3x? — 7 = 37? — X 
and x° + py? = xp imply ¥ = p = 4 (or 
x = fľ = 0). 

4 (a) GF /0x)(X, Px — X) + (OF/dy)(X, F) X 
(yv—y)=90 (b)f'(%) = slope = —(OF/dx)/ 
(OF/dy) (c) Setz = F(x, y), dz = (OF/0x) dx 
+ (0F/dy) dy hence when dz = 0, dy/dx = 
— (OF /0x)/(OF/dy). 

5 y = f(x), dy = f’(x) dx. If the equation is 
solved for x as a function of y, x = g(y), then 
by implicit differentiation g’(y) is found by 
solving dx = (1/f’(x)) dy. If y = logx then 
x = &@ dx = ev dy,dy = (1/e") dx = (1/x)dx, 
hence d{log x]/dx = 1/x. 

6 (a) Since x? = y1 = {P4 it suffices to show 
that every positive number has a unique posi- 
tive gth root. This is obvious from the fact 
that y? increases from 0 at y = 0 to arbitrarily 
large numbers for large y. See $7.5 for a com- 
plete proof. (b) Ifg is odd then every number 
has a unique qth root. (c) dy = p`! dt, 
dx = qt}! dt hence dy/dx = (p/q)t?~% = 
(p/q)(y/x) = (p/q)x®/?-4,— (e) Only x = 0 
is at issue. Here it is a question of evaluating 
limp—so |A\"/A = limao (sign Wht. If r> 1 
this exists and is 0 which is also the limit of the 
derivative as x — 0. If r < 1 the limit doesn’t 
exist. (f) rx|x|"~? or r(sign x)|x| 7! 

7 (a) 12a — 12b + c (b) Into, if 12a — 
12b + c < 0; out of, if > 0; tangent, if = 0 
(c) 6¥a + 47b + c 

8 (a) If F(S(d, g(t), A(Ð) is constant then its 
derivative (OF/dx)a + (OF /dy)b + (OF/0z)c is 
zero. (b) 12(x — 2) — 12% + 3) + (z@- 1) =0 
9 (a) z— 2 = (Of /dx)\(x — X) + (f/dy) X 
(v — 7) (b) (@F/dx)(x — x) + (F/dy) X 
(y — Y) + (F/ðz)X(z — Z) = 0 is the same plane 
as (Af /dx)(x — X) + (Of /dy)\(y — Y) — (z — 2) 
= 0 where all derivatives are evaluated at 
(X, ¥, Z). Therefore 


(Of/dx): —1 = (0F/0x): (OF/dz) and 
(Of/dy): —1 = (OF/dy): (OF/dz). 


10 (a) r*cosg~_ (b) The planes r = 0 and 
o = (2n + 1)r/2 are singularities. (c) 
{fr>0, 0<0< 2m, —r/2< y < 7/2}. 
(d) —sin 0 dx + cos dy = r cos ọ dọ. Since 
y/(x? + y?) = r sin 6 cos ¢/r? cos? ¢ and sim- 


ilarly for x/(x? + y?), this gives the derivatives 
of 0. The equation cos@dx + sin@dy = 
cosgdr — rsingdg combines with dz = 
sing dr + r cos ¢ dọ to give dr, dp in terms of 
dx, dy, dz. 


$5.4 pages 183-190 


1 (7,2, —3) is the nearest point. In general, 
one can set x = MiA + AQAA, y = ALB, + 
A2B2, z = 4yCy + A2Co in (*) and solve for 
Ai, A2, hence for (x, y, z). This procedure ap- 
plies if and only if the planes (*) are not parallel 
or coincident. 


2 The problem is uj !\/a? + b +07! X 
v'a? + b2 = min., subject to bı + be = 
const., and the result follows immediately by 
the method of Lagrange multipliers. 


3 width = length = 2\/2, height = +/2. 


4 Critical poirits satisfy yz = Ax, zx = Dy, 
xy = dz for some X. Then 3xyz = M(x? + 
y? + z?) = \ so the value at the critical point 
is \/3. This excludes the possibility that A = 0 
at a maximum or a minimum. Hence x + 0, 
y Æ 0, z Æ Oat max. or min. Dividing the first 
two equations gives y/x = x/y hence x = 
+y = +z = +) = +371? and the maxi- 
mum and minimum values are 1/3\/3 and 
—1/3/3, assumed at the points (41/3, 
+1/+/3, £1/v/3). 


5 The critical points satisfy 2\ju = u — x, 
2h\jv = v — y, A2(Of/Ox) = x — u, d2(Of/dy) = 
y — v, u? + v? = 1, f(x,y) = 0. From x = 
(1 — 2u, y = (1 — 2A ,)v it follows that 
1 — 2; = 0 (the origin does not lie on f = 0). 
Thus (x — u, y — v) is a multiple of (x, y) and 
the desired conclusions follow. 


6 (a) fat (x,y) = G5) (b) No, (x, y, 2) = 
(2K + 1, —K, — K) for large values of K gives 
large values of xyz. (c) A = xy = yz = zx 
gives Ax = xyz = \y = dz. If A #0 then 

= y = z = 4. If A = 0 then two of the 3 
must be zero and (x, y, z) = (1, 0, 0) or (0, 1, 0) 
or (0,0, 1). (d) From Ax = AxAyF x, 
ABy = Bx4AyBz’, \Cz = CxA4y?z it follows 
that A = 0, hence that x = yp=2z2= (A+ 
B + C)~! and the only critical point is a point 
where the value is (A+ B+ C)~4t8+©), 
(e) A triangle on whose sides the function is 
identically zero, which is a minimum. (f) As 
before x; = x2 =°** = Xn = Kand the max- 
imum is K. If }°A; = 1 and if x; > 0 then 


Answers to Exercises | pages 183—190 482 


Xy4xg42--- x,4n < > Aixi Hence 
(x Ate x_4n)UZAd < $ Aixi/2 Ai. 


(g) geometric mean < arithmetic mean 
(h) Only when the x’s are all equal. 

7 (a) For p = 2 it is a circle. For large p it 
is nearly the square with vertices (+1, +1). 
For p slightly larger than 1 it is nearly the 
square with vertices (+1, 0), (0, +1), which is 
what it is when p = 1. (b) As max (|x|, [y]) 
(c) 13 at (x,y) = (1,1) (® 9 at (1, —1) 
(e) |A| + |B] at (x, y) = (sign A, sign B). If 
A = Q it is near the largest value all along one 
side of the ‘square’ |x|? + |y|? = 1. 


8 (a) [Ai] +l + lAl b At 
points where x; = sign A; for those i for which 
A; #0. (c) |Ayx1 +e AnXn| < | A] 1] x]o0 
(d) max (Ail, | Aol, sey |An). 


9 Fix (x1, x2) and solve (x; + y1) + (x2 + 
y2)* = max. or min. on yj + ys = K. It is 
easily seen that (xı + yi, x2 + y2) isa multiple 
of (yı, y2) hence (yı, y2) and (xı, x2) are 
collinear; this means there are just two critical 
points, which must be the max. and min. 
Similarly in (c) it is easily shown that the max. 
and min. occur when (x1, x2,...,Xn) and 
(Y1; Y2,..., Yn) are collinear, which is when 
equality occurs. 


10 Elliptic. The minor axis is (1 — /5)x + 
2y = 0 and the max. on x? + y? = 1 occurs 
at points of the minor axis. The major axis is 
of course perpendicular. 


11 Hyperbolic. The minimum value on 
x? + y? = 1 is at the intersections with the 


line (1 + ~/5)x + 2y = 0 where it is 2 — +/5. 


12 There are essentially two configurations. 
Concentric ellipsoids is one and the other is 
an elliptical double cone surrounded by hyper- 
boloids of one sheet and containing hyperboloids 
of two sheets. Otherwise there are the ‘degen- 
erate’ cases where the level surfaces are elliptical 
cylinders, or hyperbolic cylinders, or planes. 


13 If not, then (x,y,z) is a multiple of 
(x1, Y1, Z1), Which implies, since xix + yry + 
zız = 0, that the first equation is not satisfied. 


14 (a) u? + o? + w? =1 (b>) Aw? + 
A20? + Agw? (c) The seven points (0, 0, 0), 
(+1, 0,0), (0 + 1,0), (0,0, +1) are the only 
critical points if 0, 41, Ae, Ag are distinct. 
Otherwise there can be disks, line segments, 
circles, or spheres as before. 


15 (a) The polynomial has leading term 
(—d)", hence it is positive for \ = —K (K 
large). If its constant term is negative it must 
change sign, and hence have a root, between 
0 and —K. 


16 Let A = (A;,) be the symmetric matrix 
corresponding to Q. As in the text there is a 
matrix M such that AM = MA where A is a 
matrix with à; in the ith column of the ith row 
(i = 1,2,...,n) and 0 elsewhere, and where 
the transpose M‘) of M is its inverse, i.e. 
MM‘) = I. Since det (A) = det (A), two of 
the à; must be negative—unless Q is positive 
definite. But then Q would be negative definite 
on a 2-dimensional affine manifold through the 
origin. Since this manifold must intersect 
Xn = 0, the assumption that Q(x1, x2,..., 
Xn—1, 0) > 0 would then be contradicted. 


17 The determinants are > 0. 


18 (a) Use estimates such as those on p. 154. 
(b) Choose e smaller than the least value of 
Q on u? + v? + w? = 1 and choose S as in 
(a) with B= 1. Then for (x,y,z) satisfy- 
ing (x — ¥)? + o% — P? + (2-27 < 
S, setting s = [x — ¥)? + Y — jy)? + 
(z — z)7]!/2 shows that |s~?[F(x, y, z) — F(R, 
7, 2)] — Ou, v, w)| < e where u? + v? + 
w2 = 1. Thus F(x, y, z) > F(X,3,2Z) as de- 
sired. (c) If Q assumed negative values then 
an argument like (b) would show that there 
must be points arbitrarily near (X, Y, Z) where 
F(x, y, Z) < F(®, y, Zz). (d) If all are positive 
there is a local minimum. If one is negative 
there is not a local minimum. Otherwise neither 
conclusion can be drawn. 


19 If two consecutive points coincided there 
would be an (n — 1)-gon with the same L and A, 
hence A < (1/4(n — 1))(cot(r/n — 1))L? < 
(1/4n)(cot(r/n))L? and the given polygon would 
not be a solution of A = max., L = const., be- 
cause it is less than a regular n-gon. L is differ- 
entiable because \x is differentiable for x > 0. 
The equations dA = dL are v;41 + v:/2 = 
Muy — Ui+i)/Ti, — (uig i +u:)/2 = N(vi — vi+1)/li. 
dL # 0 because dL = 0 would imply all ws are 
equal and all vs are equal, hence, since 
Seu; = 0, $ 3-1 v: = 0, would imply all u’s 
and v’s were zero. Since ?2,, — I? = (uia41 + 
+ u:)(uipi — Ui) + (Vipi + vtip — Ui) = 
(Vipi — v)(uiy1 — u)Qd — 2p! = 0 it fol- 


Answers to Exercises | pages 193-195 483 


lows that J; = [41 = L/n. Then 

Wisi — NOi41 = —U; — NO; 

N Uig 1 + Ui41 = Nu; — U; 
where \’ = 2\/nL, hence 
(i) =— Ge TT) Ca 1) G) 

visi) “o | -N 1/ w 
The matrix M(N) is (see $7.5) of the form 
a —b 
( a) 


where a? + b? = 1 and where a = 1. Thus 


(a, b) = (cos 6, sin 0) for 0 < 0 < 2r and 
„ _ (cosn@ —sin nô 
[MW] = (o nô COs no) 


and the fact that [M(\)]” — Iis not one-to-one 
gives cos nô = 1, n0 = 2rj for some j. 


$5.5 pages 193-195 


1 Ifn = kin(1) and ifthe map is non-singular 
of rank k then the image is defined by the m — k 
equations (2b), i.e. by the equations z; = 0 
where z; = y; — hi(1,..., yr). These equa- 
tions are independent because ((Zk41,... , Zm)/ 
ÔlYkt+1s -> Ym) = 1.If m = n — kand if (1) 
is nonsingular of rank m then the equations (2a) 
parameterize the level surfaces using the k inde- 
pendent variables Xn—k+1,.. <, Xn- 


2 Sety = x? +- + x2. Then dy = 0 only 
at the origin so the equation y = const. defines 
a differentiable manifold near any point other 
than (0,0,...,0). This point is the unique 
point (zero-dimensional manifold) where y = 0. 


3 No. Near (0, 0) it consists of two branches 
and cannot be described by an equation F = 
const. 


4 Such a matrix is invertible if and only if 
x1x4 — x2x3 Æ 0. The set of such matrices is a 
4-dimensional manifold parameterized by xı, 
X2, X3, X4. 


5 Set a, = x? + x, az = x + x£, 
b = x1x3 + x2x4. Then orthogonal matrices 
are defined by the relations a; = 1, a2 = 1, 

= 0. These relations are independent because 
daı da2 db = [—4x1x3x4 + 4x2x2] dxı dxe dx3 
++- = 4{xodx1 dx2dx3 — xı dxı dx2dx4 — 
x4 dey dx3 dx4 + x3 dx2 dx3 dx4} is not zero at 
any point (x1, X2, X3, x4) satisfying a; = 1, 
ag = 1, b = 0. Thus these relations define a 


1-dimensional manifold. The parametric curves 


vi- 2 t (cos! —sin r) 
—t V1l—?/’\sint  cost/’ 


1—7r2 —2t 
I+? 147 
2t 1 — z? 
1+: 14r 


for ¢ near zero all parameterize the orthogonal 
matrices near the identity matrix. 


6 Letxy(i,j = 1,2,...,n)denote the entries 
of a typical n X n matrix, and let a; = 
Dora a, by = Dope XXi. A matrix is 
orthogonal if and only if it satisfies the 
n + n(n — 1)/2 relations a; = 1 (i = 1, 
2,..., A) b; =0 A<i<j<n). If it is 
shown that these relations are independent it 
will follow that orthogonal matrices are an 
n? — n — n(n — 1)/2 = n(n — 1)/2-dimensional 
manifold. They are independent near the identity 
matrix (x; = 1 if i = j and 0 otherwise) be- 
cause at this point da; = 2 dxun, dbj = dxi; + 
ax;; so that daı -+ da, dbi2 db\3°°° dbn—i,n 
Æ 0. Near any orthogonal matrix Mı the or- 
thogonal matrices are those for which My 1M 
is orthogonal (because the inverse of an orthog- 
onal matrix is orthogonal and the product of 
two orthogonal matrices is orthogonal). Hence 
multiplication by Mī ! composed with the func- 
tions a1,...,b12,... gives n(n + 1)/2 inde- 
pendent relations defining the orthogonal 
matrices near Mı. 


7 (a) (x— a)? + y? = land —2(x —a) = 0 
gives y? = 1, y = +1. These two lines are 
obviously envelopes of the circles (*). (b) The 
equations f = const., fg = 0 define a 1-dimen- 
sional differentiable manifold near (X, y, a). 
Let (x(‘), y(t), a(t)) be a parameterization of 
this curve. Since (x(t), y(t)) satisfies g(x, y) = 
const. it suffices to show that this curve (x(t), 
y(t)) is tangent to the curve f(x, y, a(t) = 
const. in the xy-plane. Using f = const. and 
fa = 9 gives 0 = ((0f/dx)(dx/dn)) + (COf/dy) 
X (dy/dt)) + (COf/da)(da/dt)) = ((af/dx) X 
(dx/dt)) + ((Of/dy)(dy/dt)) and the desired re- 
sult follows. (c) This example illustrates very 
well the difference between the theoretical 
solutions of (b) and actual solutions by for- 
mulas. The formulas here can get very compli- 
cated. A fairly easy solution is the following: 
Set vz, = v cosa, Vy = vsina. Eliminating t, 
clearing denominators and using 2cos*a = 


Answers to Exercises | page 200 484 


cos 2a +1, 2sinacosa = sin2a, puts the 
equation of the trajectory in the form yu? + 
gx? + yv? cos 2a — xv? sin 2a = 0. Differentiat- 
ing with respect to a and setting equal to zero 
gives yv? sin 2a + xv? cos 2a = 0 which leads to 


sin 2a = (+x/V/x? + y”), cos2a = (Fy/ 
4/x2 + y2). Using these to eliminate a from 
the original equation gives yv? + gx? = 
+v? x? + y?. Squaring and cancelling gives 
then 2gv?y + g2x? = vt as an envelope of the 
trajectories. 


$6.2 page 200 


1 rab. 
2 The absolute value of 


1 xo yo Zo 
all xı yı xı 
6 |1 x2 y2 z2 


1 x3 y3 Z3 


[See Ex. 3(e), §4.6 and Ex. 5, §1.4.] 


3 Adopting the notation of §2.3, if |S| < ô 
then V(S), which is the total volume of all 
rectangular parallelepipeds of S which lie 
partly inside D and partly outside D, is at 
most (1 + 6)? — (1 — 6)? = 66 + 263 — 0. 


4 The matrix equation 


(1) Co 1)=G 1) 

0 1/7 \0 1 0 17) \0O 1 
shows that an arbitrary shear can be written as 
a composition of the types (i)-(iv). Then apply 
Ex. 3, §4.3. 

5 (a) As in Ex. 3, give an explicit estimate of 
the total volume of the rectangles of S which 
lie partly inside and partly outside f(D). 
(b) For each 6 give a particular approximating 
sum > (æ) to Jyp) dy such that |a| < 6 and 
such that $ (a) = fp dx. (c) If the formula 
holds for disjoint (non-overlapping) sets Dı, 
D2,..., Dy it holds for their union. (d) Fol- 
lows from the definition of V = fp dx. (e)To 
prove (a)-(e), note that any approximating sum 
to fp) dy is less than the corresponding ap- 
proximating sum to Jf;(p) dy and that the latter 
converges to fp dx which is less than V + e. 


$6.3 pages 213-214 


1 Let g(x) denote the square of the suggested 
function f(x). Then g(x) is differentiable. 
Given P = (X, JV, Z) set c(x, y, z) = g((x — X)/®) 
x g((® — x)/da((y — F)/ (F — y)/6) X 


g((z — z)/e)g((Z — z)/e) where e is a small 
positive number. Then c(x, y, z) is differentiable, 
c(x, Y, Z = 1, and c(x, y, z} = O unless 
Ix — X| < e, |y — F| < e |z — 2| < « Fore 
sufficiently small this function will serve as cp. 
2 S near P can be parameterized by (x, y) or 
(x, z) or (y, z), hence F; and F; can be expressed 
as an invertible relation between these two 
coordinates and (u, v). By composition F7! ° F; 
is an invertible relation (u,v)— (u,v). Set 
g = F7! [° F;. Then g has positive Jacobian if 
and only if g*(c) is a positive multiple of o 
for every 2-form ø in uv. Since every 2-form ¢ 
in uv can be written in the form o = F¥(w) 
where w is a 2-form in xyz, it follows that this 
is true if and only if F¥ (w) = (F; ° F71 ° F)* (w) 
= (Fj 1° F)*(Fř(%)] = g*(o) is a positive 
multiple of Fw) = ø for all w, q.e.d. 


3 Let x = fı(u, v), y = fou, v), z = f3(u, v) 
be the parameterization of the torus given in 
Ex. 7, 82.5. Then any square {# < u < ū + 2r, 
5 <v < D+ 2r} in the w-plane parameterizes 
the entire torus. The only problem is to include 
every point of the torus in the interior of the 
image of such a square. The 3 squares obtained 
by setting (m, 0) = (0,0), (29/3, 2r/3), (4r/3, 
4r /3) accomplish this. 


$6.4 page 218 
1 Ifthe map is affine 


x = au + bu +c 

y = du + bv ++ d 
then one can assume, by rotating the xy-plane 
if necessary, that b’ > 0. Then y = c’ is nota 
boundary of R because points where y = +b'e 
+ c' are in R. Therefore x = c must be a bound- 
ary of R because R is only allowed two kinds of 
boundaries — x = const. or y = const. — and 
the point (ae + c, a'e + c^) must lie outside R 
while (c, c^) must lie inside. Therefore b = 0. 
Therefore a > 0. The remaining statements 
follow easily. 


2 Iff has continuous second partials then the 
functions ðy:;/ðx; are differentiable; thus if 
w = Ady + +» is a differentiable form then 
f*¥(w) = A(Oy/dx) dx + -++ and all terms in 
f*(w) are differentiable because sums, products, 
and compositions of differentiable functions are 
differentiable. Conversely, since dy; is differenti- 
able, the assumption that the pullback of a 
differentiable k-form is differentiable implies 
that f*(dy) = (Oy;/Ox1) dx1 + °: + 


Answers to Exercises | pages 240-242 485 


(Oy;/0xXn) dxn is differentiable, hence that the 
functions dy,/dx; are differentiable, hence that 
the functions y; are twice differentiable. 
3 One possibility is the following. Let 
Fy: (u, v) > (x, y, z) be the affine map which 
carries (0, 0), (0, 2), (2, 0) to Po, Pı, P2 respec- 
tively and let Ry = {(O<u< 1,0 <0 < I}. 
Similarly let Fo, F3 be the affine maps which 
carry (0,0), (0, 2), (2,0) to PıP2Po, P2PoP1, 
and let Ro = R3 = Ry. This is almost a set of 
charts for PoP;P2 except that the midpoint of 
each side is not in the interior of any chart. Let 
F4: (u, v) > (x, y, z) be the affine map which 
carries (—2, 0), (2, 0), (0, 2) to PoPiPe2 and let 
= {0<u<1,-I1<v< 1}. Defining 
Fs, Fg analogously gives a set of charts. 


$6.5 page 223 


1 If R; is all of {ļu;| < 1} then no points of 
OS are involved. If R; has just one side inside 
{lu;| < 1}, then, by rotating uiuo- + u,-space 
if necessary, it can be assumed that this side is 
of the form u, = const. and that R; lies on the 
side uy < const. Map the square {lvi| < 1, 
, lue—1| < 1} to this side of R; by u; = v;—1. 
Composing with the F:, the resulting maps 
(Ui... ,Uk—1) — (X1, X2,..., Xn) give a set of 
charts for 0S, as can be proved by the method 
of Ex. 1, §6.4. [The case k = 0 must be handled 
separately. ] 
2 Let R = {a; <u; < bj} where a; < bj; 
i= 1,2,...,k.1f w = Ay duz dus- du, the 
formula is far w = Siu, i<1 [A41(61, U2, U3,... 
Ux) — Ay(aj, U2,..., ' Ux)] dug duz- dug. 
The other terms of the general formula are ac- 
companied by the sign (—1)’—! where u; is the 
variable held constant in the integration. 


$6.6 page 225 


1 A function (assigning numbers to points). 
A finite collection of points, for each of which 
a sign (+) is designated. 


2 If material is continuously distributed in 
xyz-space then ‘mass’ assigns to each compact 
3-dimensional region of xyz-space the mass of 
the material contained in the region. ‘Density’ 
is the ratio of mass to volume (= dx dy dz) for 
infinitesimal cubes in xyz-space. 


§7.1 pages 234-235 
1 The formula 
UNED = uM + (1 = (U) + )/2 


with the initial approximation u© = 1 gives 
easily uf) = 1 — 62/2 — 4/8 — €6/16 — 8/128 
whereas the binomial series d+y=1+4+ 
ny + n(n — 1)y/2 +--+ (See §8.4) gives 
Ad — x22 = 1 — (1/2. — (1/8)x4 — 
(1/16)x® — (5/128)x8 — -> 

2 The line through (x, f N) parallel to 
the original tangent line is the line (y — f (x™))/ 
(x — x™) = a. This line intersects the line 
y = Jat (xt), >) where xVt) = x) 4 
(y — f(x™))/a. The process of solving the 
equation y = bx + c when y is given, and when 
the approximate slope a is used, leads to 
XNTD = XM) 4 (y — bx™ — c)/a. This gives 
xM = (Ltr+r?4+---4+r%—l\(y —c)/at 
rNx® = (1 — Pr) — r) (y — dam! + 8x) 
where r = 1 — (b/a). This converges if and 
only if |r| < 1. When this is the case x‘) — 
x™?) = rN((y — c)/b — x) so the error de- 
creases by a factor of r = 1 — [F x/r œ] 
with each step. 


$7.2 pages 240-242 


1 xet = C/alyi — Dixit aix] = 
xO + Aai: 27-1 4ijx;"] hence 
xNtD) = xM + My — Lx] where M is the 
matrix which is (1/a;;) in the ith row of the ith 
column (i = 1,2,...,n) and which is zero 
elsewhere. (5) is satisfied if and only if 
> jz lail < Jail for i = 1, 2,...,; in words, 
the ‘diagonal term’ a;; in each row must be 
larger in absolute value than the sum of the 
absolute values of the other terms in that row. 


2 The terms on the diagonal (ax) should be 
large relative to the terms not on the diagonal 
(a; where i Æ j). The Gauss-Seidel method 
would be expected to converge more quickly be- 
cause it uses a more recent approximation to 
x) (j < i) in computing x. 

3 Self-checking 

4 Check by using the formula for the inverse 
of a matrix (§4.4) to express the entries of L~! 
as explicit rational numbers. 

6 Set ML = (c;;). Then (1’) is XN TY = = xe + 
(My); — 2- j Cajxy 7) - pat Cixi Let 
ML=T + U where T is ca ifj <i, ie >i 
and where U is c; if j > iand 0 ifj < i. Then 
x NtD = yA) + My — TxN+9) — Ux™), 
xND 4 7xNtD = xM 4 Tx + My — 
Tx) — Ux, (1+ T)xN+P = (I+ T)x™ 4+ 
My — (T + U)x™)? , x(NtD = x) + M'(y — 
Lx) where M’ = (I+ T)~!M. The product of 
I+Tand]—~T+T7? —7T2?4+--+-+(-T)7} 


Answers to Exercises | pages 244—245 486 


is Z — (—T)", so it suffices to show that T” is 0. 
It is easily shown that T* has the property that 
the entry in the ith row and the jth column is 
zero unless j < i — k, which implies T” = 

7 It suffices to assume that > >7-; max {|d;1\, 
lbial,..-5 |Binlt} < 1 because |MLx — x| = 
221 |Qoja1 bixi) < Deter lud- [x] = 
p|x| where u; = max {ld;i|,...,|din|} and 
where p = 2 i=1 Mi. 


8 Exactly as in 7 it suffices to assume that the 
p-norm (see §9.8) of the g-norms of the rows is 
less than 1. 

Q ix) — x] < ZER [xO — xGtD] < 
x) — x] < pV — p) "x — x | and the 
desired result follows by letting M— œ. This 
estimate’ is more useful because ¥ is not known, 
whereas x, x) are known. 


10 IfZx = Othen|MLx — x| = |—x| = |x| 
and (3) implies |x| < p|x| with p < 1, hence 
|x| = 0. Thus (3) implies L is one-to-one, hence 
by dimensionality L is onto. The method of 9 
shows that (3) implies (1) converges, and passage 
to the limit in (1) gives x = x + M(y — 
Lx™), M(y — Lx) = 0. Set z = y — Lx”. 
Since L is onto, there is a w such that z = Lw. 
Then pļw| > |MLw — w| = |Mz — wi 
|—w| = |w| sow = 0, y — Lw@ = z 
Lw = 0, y = Lx, q.e.d. 


11 Let x be the -tuple (x1, x2, ..., Xn) which 
is 1 in the ith position and O in the other 
positions. Then the n-tuple Ayx is equal 
to the ith column of Ay, so the equation 
limy» Ayx = Bx implies that the limit of the 
ith column of Ay is the ith column of B. Since 
this holds for all i, the desired conclusion follows. 


13 The solution # exists, since otherwise L 
would not be onto and consequently Q would 
not be positive definite. At the step in the Gauss- 
Seidel iteration, where x; is being ‘corrected’, 
the other x’s are held constant and x; is moved 
to the point on the line x; = const. (j # i) where 
Q(x — X) is a minimum; this follows from the 
observation that at the minimum the partial de- 
rivative of Q(x — xX) with respect to x; must be 
Zero, 2j- 2a: i(x; ~~ X;) = 0, i=l AyjiXj7 = Yis 
and that the Gauss-Seidel iteration consists in 
using this equation )\j~; aix; = yi and the 
fixed values of x; (jJ ¥ i) to determine x,. If n 
consecutive steps do not change x then 
S a;;x; = yi, for all i and x is the solution £. 
Otherwise, since Gauss-Seidel is (1^) when M is 
as in Ex. 1, Ex. 6 implies it_can be written 


xN+D = xM) 4 My — LXM), xWV4+D — % = 
xN) — xX + M'L(ž — x™)), xVth = x + 
N(x) — %) where N = I — M’L. Let p < 1 be 
the maximum value of Q(Ny) subject to the con- 
straint Q(y) = 1. Then Q(y) = k implies 
Q(Ny) < pk, hence Q(x? — X) < pYO(xX — 
xX). Let m > 0 be the minimum value of Q(x) sub- 
ject to the constraint |x| = 1. Then Q(x) < km 
implies |x| < k. Hence |x™ — ž| < pYO(x — 
x)/m — 0 and x) — £ as desired regardless of 
the choice of x, 


$7.3 pages 244-245 


1 The formula is an immediate consequence 
of (3). When y = 2 and x = 1 this gives 
xD = 3/2, x = 17/12, x = 577/408, 
x = 665,857/470,832. Now x? — x” = 
1/2:2-3, x2 — x) = 1/2-12-17, x — 
x) = 1/2- 408-577 and, by an easy calcula- 
tion, x) — x) = 1/2 - 665857 - 470832. Thus 
the step from x to x affects only the 12th 
decimal place, and division of 665,857 by 
470,832 gives /2 to eleven places. If decimals 
are used throughout the calculation then the 
first eleven places of x) are correct. The 
answer is 1/2 = 1.41421 35623 73095 to 
fifteen places. 


2 xN+D = 12x + (y/[e“]%)]. The cube 
root of 2 is 1.25992 10498 to ten decimal places. 


3 If x is a positive number such that x? = 7 
with six place accuracy, and if 6 = x — V T, 
then x? = (Vr + 6)? = r + 2ôVr + ô? = 
T + ôV + Or + ô, ò = Q? — r)/ 
(VT + x). Since x > 1, \/r > 1 this gives 
|5| < 4|x? — r|. Thus it suffices to retain 6 
places of m. The square root of m is 1.77245 
38509 to ten places. 


4 Newton’s method gives the formula xy4.1 = 
-41 + 168)/0 — 2x%)). Taking xo = 0 
gives xı = —1/12, x2 = —(1/12)(214/213). 
Now (214/213) = 1.004695... from which 
x2 = —.0837246.... Take x2 = —.08372. 
Then (1 + 16x3)(1 — 2x3)-! can be found 
without too much calculation to be about 
1.004697, from which x3 = —.0837247.... 
One can safely say then that —.083725 rep- 
resents the root to 6 places. 


5 The diagram is changed in that the line 
through (x), f(x™)) parallel to the tangent 
at (x,f(%)) is replaced by the tangent at 
(0%), f(x). 

Bx MHD = x) 4 (y — (1/x0)/(— ™)-=2) 


Answers to Exercises | pages 251-257 487 


= 2x) — Myx), Ify = 1 — e x®™ = I then 
xD=2-(1-O=1+tex" =21+6—- 
a+- o= (1+ 92 -A — e) = 
(1 + (1 + €*), x = x [2 — (1 — ex] = 
x212—- (1 — eS] = 1+ 00 + eN + 6$), 
x4) = 1+ 9A + DA + A + 8), etc. 


$7.4 pages 251—257 


1 (a) The arrows form a vortex rotating 
counterclockwise. (b) The derivative of this 
function of ¢ is zero, hence the function is 
constant. Geometrically, all points (x(4), y) 
of the solution curve lie on a circle. (c) 
Adopting a slightly different notation from that 
of the text, 


Go) = Co) 
yo(t) 0 

1— I yolu) du 1 
(20) = 0 = ( ) 
yi(t) t 


0+ i xo(u) du 
1 — i yi(u) du ? 
(20) — 0 — 1— — 


y(t) [ 
0+ > xı(u) du t 


i- | yo(u) du f° 
(0) - 0 _|i-5 


y3(t) Í ts 
0 + xə(u) du t — -> 
0 6 
etc. In general xən() = xəņ41ı() =1 — 
(£?/2) + (¢4/24) — +++ + (DYGN /N)Y 


and yenai(t) = yəan+2() = t — (13/6) +> 
+ (—1)¥(02%+!/Q2N + 1)!). The method of 
p. 248 (choose J such that J > |r| and set p = 
|t|/J < 1) can be used to show that lim y» 
xn(t) and lim y» y(t) exist for all ¢. The fact 
that the functions so defined are continuous and 
satisfy the differential equation follows, as 
before, by passing to the limit under the 
integral sign (see Ex. 7). (d) The functions 
(x(t), y(t) = (X cos t — F sin t, X sin t + 
y cos t) begin at (x, 7) when ¢ = 0 and satisfy 
the equation. (e) The functions (cos (a + 
t), sin (a + f)) and (cos a cos t — sin a sin t, 
cos a sin t + sin a cos t) both start at (cos a, 
sin a) and both satisfy the equation, therefore 
they must be identical. This gives the addition 
formulas. 


(f) ( cosa sin a) cosb sin p) _ 
—sina cosa —sinb cosb 


( cos (a + b) sin(a + D) , 
—sin (a + b) cos(a + b) 
(g) Multiplying formally, setting i? = 1 and 
using “a + ib = c + idimplies a = c, b = d” 
the given equation implies the addition for- 
mulas. (h) cost > 0 for small t, hence sin f 
is increasing for small t, hence sin ¢ is positive 
for small positive t, hence cos ¢ is decreasing for 
small positive ¢. It can stop decreasing only if 
sin t = 0, which implies cost = +1. Since 
cos¢ starts at 1 and decreases, this implies 
cos ¢ decreases until it reaches —1. Then an 
analogous argument shows that it increases 
until cos £ = 1 again. (i) 2r is defined to be 
the smallest positive solution ¢ of the equation 
cos ¢ = 1. Then sin 2r = 0 and by the addition 
formulas cos (t + 2r) = cos t, sin (t + 2r) = 
sin t for all ¢. The functions (cos rf, sin f) for 
{0 < t < 2r} parameterize the boundary of the 
disk D = {x2 + y? < 1}, hence fp dx dy = 
Sn d(x dy — y dx) = Jan ax dy — y dx) = 
af" [cos? tdt + sin? tdt) = 4/27 dt = r. 
(j) The advantage lies in the periodicity of the 
functions cos (t + 2r) = cost, sin (t + 2r) = 
sin t. Once the values in the range {0 < t < 27} 
are known—and it is most convenient to space ¢ 
evenly over this interval—all other values can 
be found by periodicity. (k)-(n) Compare to a 
trig table. 


2 (a) It is the solution of dx/dt = x which 
satisfies x(0) = 1. (b) Both ett” and ete” 
satisfy the differential equation dx/dt = x and 
both start at e”. Hence they are identical. 
(c) xo = 1. x1) =1+ fi 1 du =l+t, 
xe) =14+ fd +u)jdu = 1 + t+ (12/2), etc., 
xn(t) = 1+ t+: t Y/N). lim yo x(t) 
exists and satisfies the differential equation for 
all ż as in 1(0). (d) exp (9 = [exp (¢/2)]* > 0 
hence the function exp (t) is always increasing, 
hence exp (t) > 1 for t > 0, hence exp (1/n) — 
1 = fd” exp ()dt > f} dt = (1/n);exp (1) = 
[exp (1/n)]” > (1 + 1/n)”. In the same way 
1 —exp(—1/n) = f°, m exp(Ddt < flim dt = 
1/n, (n — 1)/n < exp (—1/n). Multiplying by 
the positive numter (n/(n — 1)) exp (1/n) gives 
exp (1/n) < (n/(n — 1)), exp (1/(n + 1)) < 
(n+ 1)/n = 1 + 1/n, exp (1) < (1 + 1/n*!. 
Thus the difference between (1 + 1/n)” and e 
is less than (1 + 1/n} t! — (1 + 1/n} = 
(1/n)d + 1/n < (1/n) exp (1) — 0, q.e.d. 
(e) Such a table is a table of 10” for equally 
spaced rational values of x. It suffices to con- 
struct such a table for {0 < x < 1} since the 


Answers to Exercises | pages 251—257 488 


formula 107+! = 10-10" enables one to find 
all other values easily. (f) Use the power 
series of (c). Check by consulting a table of 
anti-logarithms to base 10. 10!-99! is exp ((1001/ 
1000)a) because the thousandth power of this 
number is exp (1001 a), which is the thousand- 
and-first power of exp (a) = 10. 

3 cosh? ¢ — sinh? t= 1, cosh (a + b) = cosha 
x cosh b + sinh a sinh b, sinh (a + b) = 
sinh a cosh b + cosh a sinh b. cosh t = 1 + 
07/2 eee + lmn +--+, sinht = t+ 
13/6 + ... = r2"t1/(2n + 1)! + eee, 


oO 
CO) = C)+ [oar 
-a+ mÇ) 


and by induction 


xy pi) _ ‘| (0) 
(20) (* +f M yn(u) du 


+ 
=(/+iM+——4-°: 


n M) (®) | 
(N + 1)! y 


Defining exp (1M) = } -0 (tM”/n!) then 
gives the desired formula. 


ex ¢ =) = (E cos b 
Pp a etsin b 

The identity exp (zı + z2) = exp (zı) exp (z2) 
follows by direct multiplication (using 1(f)) or 


by the last part of this problem. For the matrices 
Mı, M2 that are given 
cosh 1 sinh i) 


exp (Mı + M2) = (rr ! cosh 1 


1 1 
exp My exp M2 = ( i) (i D-6 i) 


The multiplication formula exp (Mı + M2) = 
exp (Mi) exp (M2) (when M,Me2 = M2M)) 
can be deduced from the uniqueness of the solu- 
tion of a differential equation as in Ex. 1 and 
Ex. 2. Specifically, exp (Mı + 1M2)(X) (where 
X is a fixed n-tuple) and exp (M1) exp (tM2)(X) 
start at the same point x = exp (M)(X) and 
satisfy the same differential equation dx/dt = 


—e* sin P) 
e? cos b/ ° 


Mox, hence they are identical and the desired 
result follows from setting ¢ = 1. Algebraically 
the multiplication formula exp (Mı + Mə) = 
exp M, exp Mois the formula J + (Mı + M2) + 
(Mı + M2)?/2!+ +--+ = (I+ Mi + M?/2! + 
- +) + M2 + M8/2! +--+) which, upon 
equating terms of like degree, becomes the alge- 
braic identity 

(Mı + Mə)"/n! = È i+i=n (MiM3/il)!) 
which is essentially the binomial theorem. This 
leads to an alternative proof that exp (Mı + 
M2) = exp (Mı) exp (M2) (provided Mı M 2 = 
Me2M)). 
5 (a) The vertices of the polygon are (i/10, 
(11/10)) fori = —10, —9,...,9,10. (b) The 
21 points are found by the formula 


PERO 

is 1 0 

fori = —10, —9,...,9, 10. E.g. for i = 3 the 
(1 ~ 7p)’ (1) i (370) 

point is (a 1 0) = M599) ` (c) For 


each N let Py(t) denote the function whose 
graph is the polygon. Let ¢ be given. For each N 
there is an integer j such that j/N < t < 
(j + 1)/N. By definition of Py, (1 + 1/N)) < 
Py(t) < (1 + 1/N)+!. As N > œ this interval 
becomes arbitrarily short, so it will suffice to 
show that (1 + 1/N)' —> e as N —> œ. Raising 
(1+ 1/N) to the power 1/tand using N — 1 < 
j/t S N gives (1 + 1/N)N~* < [(1 + 1/N)]!“ 
< (1 + 1/N)’. As N— œ this gives [(1 + 
1/N)]!“ > e, 1 + 1/N¥ > et, gq. e. d. 

6 The theorem is not contradicted because 
x?/3 is not differentiable at x = 0. 

7 For each fixed ¢ the Cauchy criterion is satis- 
fied by the numbers x(t), hence they determine 
a limiting value x‘ (r) for each r. Moreover, 
|x (r) — x] < efor alln > N and for all 
|r| < 6 when e, N are as before. Fixing n and 1, 
the continuity of x“ (7) implies that there is a 6 
such that [x(n — x™()| <e whenever 
jt — t| < 6. Hence |x) — xe] < 
Or — xD, + [x — xO] + 
x1) — x (| < 3e whenever |t — t| < ô. 
Since e was arbitrary, this shows that x® (f) is a 
continuous function (see §9.8). Finally, f(x“ (0) 
is a continuous function, hence fé fix (s)] ds 
exists and, by (4), 


t t 
J fix (s) ds — | fixe (S) ds 


t 
< K | x's) — xX) ds < K e€ t > 0. 
0 


Answers to Exercises | pages 276-277 489 


8 Define function s(t), x(‘), yi(0, yo(d,..., 
y(t) by applying the theorem of the text to the 
differential equations 


T = | s(0) = 0 
= = yı x(0) = given 
zi = yo yı(0) = a (0) = given 
Hi = yk yz—1ı(0) = E. (0) = given 
Di = f(Yk—1s -> Y1, X, S) 
y,(0) = dx (0) = given 


where f is the solution of (*) for d*x/dt* as a 
function of the remaining variables. Then x(ġ 
is a solution of (*) and is the only solution with 
the required initial values. 


9 Setting y = dx/dt converts this equation to 
the equation of Ex. 1. Hence the solution is 
X cos ¢ + F sin ź where X is the intial value of 
x and F is its initial derivative. 


10 Identical to 4 
11 Let M bethe k X k matrix 


0 0 0 0> 1 

—ao —a, —a2 —a3°+** —ak 
Then for any k-tuple X = (X1, Xo,..., Xx) the 
first entry of exp (tM)(X), considered as a func- 


tion of ¢, is a solution of the given differential 
equation and every solution is of this form. 


$7.5 page 264 

1 m = 3.14159 26535 89793 to fifteen places. 
2 log2 = .69314 71805 to ten places. 

3 log 3 = 1.09861 22886 to ten places. 


4 logi = log (i1 — )/d — Ð) = log (1 + 
D/Q- D) = Ai + 8/34 8/5+i7/7+---]= 
2i[1 —1/3+1/5—1/74+ 1] =2i AA+ tdt 
= 2i X (area subtended by the arc from 
(1, 0) to (0, 1)) = 2i- m/4 = ir/2. The ex- 


ponential of the matrix i 5 = 


$8.1 pages 269-270 


1 (a) grad f = 2xi + 2yj + 2zk (b) 
grad f = —r-3(xi + yj + zk) (c) grad f = 
2(x? + y*)~ (xi + yi) 
2 The divergence of (a) is 6, and the divergence 
of (b) and (c) are zero. 


3 For the first one div X = 3, curl X = Q. 
For the second one div X = 2x + xz, curl X = 
[cos (x + y) — xyli — cos (x + y)j + yzk. 

4 Asin Exercise 6, §1.5, the value of A dy dz + 
B dz dx + C dx dy on a parallelogram is equal 
to the oriented volume of the parallelepiped 
generated by the parallelogram and by a line 
segment parallel to the segment from (0, 0, 0) 
to (A, B, C). This is zero if the line segment is 
in the plane of the parallelogram and is the 
length of the line segment times the area of the 
parallelogram if they are perpendicular. The 
same is true of X-ndo (the dot product of 
perpendicular vectors is zero and of parallel 
vectors is the product of their lengths) in these 
cases, so the two are equal in all cases. 


5 Both numbers are equal to the 3 Xx 3 
determinant whose rows are (A,B,C), (a1, 
a2, a3), and (b1, ba, b3). 

6 X-ndo = X: (a X b) for all X, therefore 
nds =a Xb. 


7 A small rectangle with sides du, dv ‘in the 
uv-plane parameterizes a small piece of S which 
is nearly a parallelogram. Denoting the sides 
of this parallelogram by du, dv the value of the 
integrand on the infinitesimal parallelogram is 
X : (du X dv) by Exercises 4 and 6. 


$8.2 pages 276-277 


1 (a) F(x,y) = x? + y? (b) F(x,y) = 
4 log (1 + x?) + Arctany (c) (1/yx?) is an 
integrating factor. F(x,y) = log y — (y/x) 
(d) F(x, y) = x? + xy + y? (e) F(x, y) = 
log |y| + sinx (f) e”(x? + y?) (g)e” isan 
integrating factor, as is shown by the last 


Answers to Exercises | pages 288-289 490 


example of the text. Thus the equation is 
e? dy + 2xe?y = 2x3e"”, dle?y] = dlx7e?’] 
2xe?, dle*y] = d[x%e" — et], ety 
(x2 — 1)e? + const. so F(x, y) = e [y + 1 
x] has the desired property. 

2 (a)xdx-—ydy=0 (b)xdy+ydx=0 
(c) (x — lbdx + ydy = 0 (d) xdx + 4y 
dy = 0 (e) The curves are x? + y? + 
Cy = 1 which gives, when one solves for C 
and differentiates, (1 + y? — x*)dy + 
2xy dx = 0. 

3 (a) and (b) are orthogonal trajectories of 
each other. The radial lines y = C(x — 1) are 
orthogonal to (c). The curves y = const: x4 
are orthogonal to (d). 


4 The orthogonal trajectories are the ellipses 
x? + py* = const. 


§8.3 pages 288-289 


1 A function u(x) of one variable is harmonic 
(26)—! [Zt u(x) dx = u(X) if and only if it is 
affine u = Ax + B, because the analog of 
Laplace’s equation is (d?u/dx*) = 0. The ana- 
log of the Poisson Integral Formula is the 
formula u(x) = (x — a)(b — a)~!u(b) + 
(b — x)(b — a)~u(a) giving u(x) fora < x < b 
in terms of u(a), u(d). 

2 Fromrdr = xdx + y dy + z dz the identity 
r? drw = dx dy dz follows easily. The identity 
dw = 0 can be proved by simple differentiation. 
[A more satisfying, if not simpler, method is to 
observe that w is unchanged by the substitution 
x — cx, y —> cy, Z — cz; this can be shown to 
imply that if w is expressed in terms of the new 
(local) coordinates r = Vx? + y? + 22, u = 
x/r,v = y/r it does not involve r, hence is a 2- 
form in 2 variables and must be closed.] The 
proof that the double integral fp w dr can be 
written as an iterated integral will be omitted. 
By (v) of §2.6, 


r r+Ar 
aj f(r) dr = tim L | f(r)dr = f(r). 


The left side is constant because dw = 0. For 
the last step, one must observe that on the 
sphere a? + b? + c? = 1 the identity a da + 
bdb 4+ cdc =0 is satisfied, which gives 
a?db dc + ab dc da + ac da db = a*db de + 
b dc(—b db — c dc) + c(—b db — c dc)db = 
a?db dc — b? dc db — c*dc db = db dc and 
analogous identities for dc da, da db. 

3 The difference of two harmonic functions 
with the same values on ðD is a harmonic 


function which is zero on OD. It is to be shown 
that such a function is identically zero. If not, 
then its absolute value has a non-zero maximum 
at some point P inside D. Its average over a 
ball with center P is equal to its maximum 
value, which means it is constant on such a 
ball. Letting the ball expand until it touches 0D, 
this implies a contradiction. 

4 dr) = -r-2dr = —4r-d(r*) = 

—(xdx + ydy + zdz)r~?. Hence w = 

—((0(r—!)/dx) dy dz + (0(r7!)/dy) dz dx + 

(0(r—!)/dz) dx dy) and dw = 0 is Laplace’s equa- 
tion for r~!. Similarly, in n dimensions d(r2~—”) 
= (2—n)r'!-"ar = (2 — n(x, dxi tee + 

x,dx,)/r". Thus (ð? )/Ox, \dxydx3°°* dx, 
+ -:--3s (2 — n) times (1 r) [ey dxe dx3 °° - 
dx, — -+ (-1)"7 x, dey dxo+ ++ dXxn—ı] 
which is the n-dimensional analog of w and can 
be shown to be closed in the same way. 

5 Following Exercise 2 the average of u(x, 
y, z) over {((x — ¥)? + O — 9)? + (2 — 2)? = 

r?} should be defined to be 


J, a2 “Œ + ra, Y + 7b, 2 + re) 
a? 4b? 40% = 
x (adbdc + bdcda + c dadb) 


| adbdc + bdcda + cdadb 
a? +b? +c? =] 
which can also be written 


J 
`- C3 u(x, Y, zyx Z x) 
Aare J (22)? 4 (yp)? 42-22 =P? 
x dy dz + (y — ¥) dz dx + (z — 2) dx dy). 


6, 7 Straightforward differentiation. The 
matrix formulation is that columns of 


u — V r — sS 
UD u S r 
are conformal coordinates if (u, v) and (r, s) are. 


§8.4 pages 310-313 


1 It suffices to show that |z1 + z2|? < 

(zıl + |ze|)?. Expanding the right side and 

using the definition of |z|? this becomes the 

Schwartz inequality x1x2e + yiye S V x? + y 
2 9 

V xs + yo. 

2 The sequence z, = Xn + iyn converges to 

Zæ = Xo + İV» if and only if both of the real 

sequences converge Xn — Xx, Yn — Ya- 

3 Choose an N such that |z, — zy] < 1 for 

n > N. Let K’ be the largest of the numbers 


Answers to Exercises | pages 319-320 491 


zil, |Z2l,..., |Zw| and set K = K’ + 1. Then 
|zv| < K for all n. 

4 Every complex number can be written in the 
form 


C cos 0 


—r sin 9) 
r sin ` 


r cos 0 


Geometrically this matrix represents a trans- 
formation R? — R? which is a rotation through 
an angle 0 and a scale factor of r. 


5 The vertices are 1, a + bi, a? — b? + 
2abi, a? — 3ab? + 3a?bi — bi, at — 6a?b? + 
bt + 4a%bi — 4ab3i. In short, they are 40, A}, 
A*, AÌ, A* where A = a + bi. The basic 
relation is A> = 1, which gives aë — 10a?b? + 
5Sab* = 1, 5atb — 10a*b? + b5 = 0. Cancel- 
ling b in the last relation and using b? = 1 — a? 
gives a? = (3 + vV 5)/8. Since a < this gives 
a? = (3 — \/5)/8 which has the solution 
a = (\/5 — 1)/4as desired. Then b = 1/1 — a? 
gives b (see also Ex. 15, §9.1). 

6 Ifz? + 1 = z then (z? — z + 1)(z + 1) 
(z3 — 1) = 0, zê — 1 = 0, zê = 1. The solu- 
tions of this equation are the vertices of a 
regular hexagon cos (2rj/6) + i sin (2rj/6) 
(j= 1,2,..., 6). Of these 6 possibilities, only 
the points j = 1, 5 satisfy the given equation, 
i.e. the solutions are (1/2) + i(./3/2). 


7 (d/d0)e* = ie, (d/d0) cos 0 + i (d/d@) sin 
6 = i cos 0 — sin 6, (d/d) cos 0 = —sin 9, 
(d/d@) sin @ = cos 8. 

8 (d/dz) log (1 + z) = 1/(1 + z). When z = 0 
this gives lim,_,. (log (1 + z/n)/(z/n)) = 1, 
liM n log (1 + 2z/n) = z, lim,_. 
enlog(1 +2z/n) = ež, lim, so (1 + z/n)” = e. 

9 By the theorem, c, is the average value of 
z—f(z) on the circle |z| = p. Since the average 
of numbers of modulus < Kp~” also has 
modulus < Kp~” it follows that |en) < Kp7” 
for all p, hence c, = 0 for n > 0. 

10 The basic estimate (s~![f(x + sh) — 
f(x)] is within «e of f'(x) - h for all sufficiently 
small s) is proved, as usual, by writing f(x + 
sh) — f(x) = fot" fO dt = SET" f'(x) dt + 
SEI Ef’) — f'l dt = f'(x) + sh + small: sh. 
In short, the entire proof is the same as in 
§5.3. 


11 x? is one example. 


12 Immediate. Differentiation under the in- 
tegral sign requires justification, of course. In 
the present case it suffices to show that z~” is 
uniformly differentiable on 0 < r < |z| < R, 


which is easily proved by estimating (z + h)” — 
z—” directly. 
13 Elementary 


14 It is easy to show that there is a constant K 
such that |A~!w| < K|w| where A~! is the 
inverse matrix. Let a = K`! and w = Av. 
Let K’ = max (col, |cil,..., |en—1|). Then by 
induction max ([cz!, |ceail,..., |Ck4n—1|) > 
K’a*, so K = K'a has the desired property. 
Since the terms of the series }\c,a—* do not 
approach zero, the series can’t converge. 


15 As in the text, it suffices to show that 
c,a* is bounded for some a, which follows by 
the method of Exercise 14. Let g(x) be the func- 
tion defined by the power series co + c1x + 
cox* + +--+, Let F(x) = f(x)g(x). Then F(0) = 
aoco = 1, FO) = f'(O)g(0) + f(O)g’(0) = 
aico + aoc, = O and, similarly, all derivatives 
of F are 0 at 0. Therefore the Taylor expansion 
of F is F(x) = 1, g) = [FQ]! 


$8.5 pages 319-320 


1 (a) Setu = A(x — X) + BY — Y) + 
C(z — Zz) and v = D(x — xX) + EQ — y) + 
F(z — Zz). If dudv Æ 0 then by the Implicit 
Function Theorem these equations can be 
solved for two of the variables in terms of u, v 
and the third variable. Setting u = v = O then 
gives the line in parametric form. If du dv = 0 
then u = 0, v = Q is a plane or all of xyz- 
space, hence not a line. (b) The coefficient of 
w3 in w; is the coefficient of wywow3 in wjw1w2. 
If wyw2 is a multiple of wjws this is zero. (c) 
This is essentially the same as (b), only with 
(x — x) in place of dx, (y — Y) in place of dy 
and (z — Z) in place of dz. 

2 Choose we41,.--,@n Such that wiwe--> 
wn Æ 0. Then w = awi + dow2 + -° + Anon 
and a; is the coefficient of wjw1°+:w;—1 X 
Witi t Wn IN Ww + * + Wj—1Wj41 Wn. lf 
wiwa is a multiple of wiwe::-+ ow, then 
the expression of w;(i= 1, 2,... , k) has no 
term in w; for j > k. 

3 (AF — BEXðD/ðz — 0C/dt) + (AG — 
CE)(0B/dt — 0D/dy) + (AH — DE)(OC/dy — 
0B/dz) + (BG — CF)OD/dx — OA/dt) + 
(BH — DF)(0A/dz — 0C/dx) + (CH — DG) 
(OB/dx — 0A/dy) = 0 and the analogous con- 
dition obtained by interchanging A © E, 
BOF, CG, DH. 

4 By the theorem there is an F such that the 
curves F = const. are solutions of the differ- 
ential equation. This means that dF is a multiple 


Answers to Exercises | pages 326-327 492 


of A dx + B dy, i.e. a multiple of A dx + B dy 
is exact. 


5 Given f: R” — R” as in the theorem, con- 
sider the differential equations dx; — fı(x) dt = 
0, dx2 — fo(x) dt = 0,...,dxn — fr(x) dt = 0 
in the (n + 1) variables x1, x2,..., Xn, t. By 
the theorem of this section there is a non- 
singular map F: R™*+! — R” of rank n such 
that the curves F = const. satisfy the differ- 
ential equations. The pullback of dF idF2--- dF, 
is dxıdx2 -+ dXn + terms in dt, so the equa- 
tions F = const. can be solved for x1, x2,..., 


x, as functions of Fi, Fo,..., Fn, t. On the 
other hand, Fi, Fo,...,F, can be written as 
functions of x1, X2,...,%n for t = 0. 


6 Since each dx; can be written as a combina- 
tion of the w;, each 2-form dx;dx; can be written 
as a combination of the w,w,;, hence any 2-form 
can be written as a combination of the w;wj. 
Moreover, the coefficient of wiwe, say, in the 
2-form w is determined by the fact that it is the 
coefficient of wjw2w3'*' Wn IN WW3W4°°* Wp. 
In this way the integrability conditions can be 
reduced to the condition that the 2-forms dw; 
(i = 1, 2,...,k), when expressed as com- 
binations of w,w,;, contain no terms in which 
both i > k and j > k. This imposes (” > “A 
conditions on each dw;. 


§8.6 pages 326-327 


1 dé = (xdy — y dx)/(x? + y?) has the 
property that its integral over Y; is 2r, and that 
its integral over Y2 is zero. The closed 1-form 


(r — 2)dz — zdr 
(r — 2} + 2? 
Vx2 + y2(Vx2 + y? — 2) dz 
— z(x dx + y dy) 
Vx? + IVx + y2 — 2)? + 22] 


has the property that its integral over Yı is zero 
and that its integral over Y2 is 27. 

2 (a) Let Po = (1, 1), Pi = (—1, 1), P2 = 
(—1, —1), Ps = (1, —1). Then the integral of 
do over the boundary of the square is [c(Po9) — 
o(P3)] + [o(P1) — o(Po)] + [o(P2) — o(P1)] 
+ [o(P3) — o(P2)] = 0 by fE do = o(Q) — 
o(P). (b) Parameterizing the sides {x = 1, 
y=t —1<t<1}, {y=1,x = —t,—1 < 
t < 1}, etc., the integral is 4ft, (1 + t2)! dt. 
On the other hand it is the integral of dô over the 


boundary of a domain containing (0,0) and 
hence it is 2r. Therefore 


r/2= f4 A + 7! dt. 


3 Instead of going around the circle to 


(x/V/x? + y?, y/ Vx? + y?), go around the 
square to (x/max (|x|, |y|), y/max (|x|, |y|)). 

4 Given a 1-form w with dw = 0 define F(P) 
to be the integral of w over a smooth curve from 
(1,0,0) to P which does not pass through 
(0, 0,0). The problem is to show that F(P) is 
well-defined, i.e. independent of the choice of 
the curve. This is hard to do rigorously, but not 
hard to imagine. 


5 See Exercise 10, §3.3. 


6,7 Hard to do rigorously. Good exercises for 
the imagination. 

8 Thecircles {(x — 1)? + y? = e*}, {(x + 1)? 
+ y? = e°}, oriented counterclockwise are a 
basis (0 < e <1). 


9 Assuming f(x) is differentiable, Poincaré’s 
Lemma says that since d[ f(x) dx] = 0 there 
must be a function F(x) such that dF = f(x) dx. 
The two proofs are the same except that in 
Chapter 3 the end point a was used rather than 
the center point (a + b)/2 of the interval 
{a< x <b}. 

10 The integrals of w — } bjw; over 11, 
Yo,...,and Yy are zero, therefore, since it is 
closed, it is exact by definition of ‘homology 
basis.” Therefore its integrals over Yi, Y3,..., 
and Y; are zero, which gives the desired equa- 
tions. 


$8.7 page 333 


1 Follows immediately from the continuity 
equation. 


2 (a) The lines x = const. equally spaced. 
(b) Radial lines equally spaced. (c) Circles. 
centered at (0, 0) drawn with a density propor- 
tional to r—!, i.e. denser near (0, 0), thinner far 
away. (d) The lines x = const., y = const. 
equally spaced. (e) Radial lines equally spaced. 


3 ‘Since w ~ 0, one can assume that the 
dx; dx2°**dx,_, term is not zero, hence that 
w = A(dx, dx2°°+ + dxXn—1 — Uy dXn dx2° °° 
dXn—1 — u2 aX, dX, dx3°** dXn—-1 —*°') = 
A(dx1 — uy dxXn)(dx2q — u2 dxXn)* ++ (dXn—-1 — 
Un—1 aXn). By the theorem of §8.6 there exist 
functions y1, y2,.-.., Yn—1 Such that dx, — 
Ui dXn, . . . 5 AXn—1 — Un—1 dXn are combina- 


Answers to Exercises | pages 375-380 493 


tions of dy1, dy2, . . . , dYn—1 and vice versa. Thus 
w = Ady; dyo:++ dyn_y. Setting p = AT! then 
gives pw = dy; dy2' +: dyn—1ı. If A can be ex- 
pressed as a function of y1, ya,..., Yn—1 then 
dA can be expressed in terms of dy, dya,..., 
dyn—ı and dw = dA dy, dy2' ° © dyn_1 = Q. 
Conversely, if dw = 0 then the map (xı, 
X2,...5;Xn) > (11, Y2, . -< 5 Yn—1, A) is of rank 
n — 1 and the Implicit Function Theorem gives 
A as a function of yi, ya,...,yn—1. If 21, 
Z2, ... , Zn—1 is another set of functions satisfy- 
ing dz; dzo:::dzn_1 = p'w for some p’, then 
dz, dz2' © dzn—; is a multiple of dy; dy2:°:: 
dyn—, and therefore dz;, dzo,...,dZn—, are 
combinations of dy1, dy2,..., dyn—1 and vice 
versa. Hence the differential equations dz; = 0 
(z; = const.) and dy; = 0 (y; = const.) define 
the same curves in x1x2°'* X,-space. Drawing 
points P in yiy2°‘** yn—1-space with density 
A(y1, ¥2,.--,¥n—1) and then drawing the 
curves y = P in x1x2°°** X,-Space gives curves 
such that fs w is proportional to the number of 
curves which cross the (n — 1)-dimension mani- 
fold S, counting the sign of an intersection as + 
or — depending on whether w is + or — on the 
oriented surface S at the point of intersection. 


$8.8 pages 354-356 


1 Because r~! is harmonic, the 2-form 
Ê Wy — Ar) dy dz + Ê (u — Ar) dz dx 
Ox oy 


-+ a (u — Ar`') dx dy 
Oz 


is closed and its integral over any two large 
spheres is the same. Fix a sphere and choose A 
so that this integral is zero. Since u is radially 
symmetric it is constant on spheres r = const. 
and by the Implicit Function Theorem u can be 
expressed (locally) as a function of r. Hence 
du = u(r) dr = f(r)(x dx + y dy + z dz). The 
corresponding 2-form is therefore f (r)[x dy dz + 
y az dx + z dx dy] and its integral over a large 
sphere S, which is zero, is also equal to f(r) X 
fs (x dy dz + y dz dx + z dx dy) = f(r): 4nr° 
because f(r) is constant on S. Hence f(r) = 0, 
f(r) = 0, du = 0, u = const. 

2 Letx = V(x — v, y = y, =2z, = 
Y(—euvx + t) where Y = (1 — epv?)—!/*. Then 
the charged particle is stationary at (0, 0, 0, 0) 
relative to the coordinates x’z’y’?’. In xyz - 
coordinates B is zero and the electromagnetic 


= (40, 32, 14) = 


field is given by Coulomb’s law as 


; e 
d (na)! dt — Anrer3 
+ y’ dy’ dt + z’ dz’ dt’) 
where e is the charge and r = (x? + y2? + 
z’)1/2. Expressing this in terms of x, y, z, t it 
becomes (— eY /4ar?)[((x — vt)/€)dxdt+ (y/e) X 
dy dt + (z/e) dz dt + 0 dy dz — wuz dz dx + 
uvy dx dy] where Y = (1 — euv*)—!/2 and where 
r= Y¥72(x — vf)? + y? + 2z?. 
3 (a) The laws are expressed entirely in terms 
of these correspondences and in terms of the 
operation of differentiation of forms which is 
the same in any coordinate system (not just 
Lorentz transformations of coordinates). (b) If, 
for example, one starts with the 2-form dz’ dx’, 
forms the pullback Y dzdx — Yv dz dt, and 
takes the corresponding 2-form (1/y)[eu X 
(—Yv) dx dy — Y dy dt] the result is the same as 
if one starts with dz’ dx’, takes the correspond- 
ing 2-form (1/u)[—dy’ dt’], and forms the 
pullback (1/u)[—Y dy dt + euYu dy dx]. The 
proof of the other cases is the same. (e) This 
is the so-called ‘star operation’ {1-forms; <> 
{3-forms}, {2-forms} <> {2-forms} determined 
by the ‘Lorentz metric’ x? + y? + z? — c???, 


(x’ dx’ dt’ 


$9.1 pages 375-380 


1 (8,2,5)+ (5,7, 1)= (40, 10 + 56, 25 + 8)= 
(40, 66, 33) = (40, 34, 1). (40, 34, 1) + (7, 4, 9)= 
(280, 238 + 160, 7 + 360)= (280, 38, 7). (3/8) + 
(—6/5) + (5/7) = (105 — 336 + 200)/280 = 
— 31/280. (8, 2, 5)(5, 7, 1)= (40, 2+ 35,5 + 14) 
(20, 16, 7). (20, 16, 7)(7, 4, 9) = 
(140, 144 + 28, 63 + 64)= (140, 80 + 28, 63) = 
(140, 110, 65) = (28, 22, 19: (3/8)(- 6/5)(5/7) 
= (3/4)(—3)C/7) = —9/28 
2 (a,b,b+ d\X(d,1, a + 1) = (1, 1,1 + 1). 
(a,ct+td,c\(d,at+1,)D=d,1,1+ 1). Ifb= c 
then (a, b, c) is zero and q7! does not exist. 
3 Reflexive and symmetric are obvious. If 
a+ 6p = b + 6qandif b+ 6p’ = c+ 6q’ then 
at 6 + p^) = (a+ 6p) + 6p" = (b + 64) + 
' = (b + 6p’) + 6g = c + 6(q’' + q) hence 
a = c (mod 6) and the relation is transitive. If 
a = b (mod 6) then ac = bc (mod 6) because 
(a + 6p)c = (b + 6q)c. Thus a= b and c = 
d (mod 6) implies ac = bc = cb = db = bd, i.e. 
ac = bd (mod 6). The proof that a + c = b + 
d (mod 6) is even easier. By division by 6, every 
natural number a can be written in the form 


Answers to Exercises | pages 375—380 494 


a = 6q + rwherer = 0, 1, 2, 3, 4,5. Thus a= 1 
or a=2or::: ora= 6 (mod 6). Given a, b, in 
solving a + x= b (mod 6) one can assume b > a, 
hence b = a + d and the equation is solved by 
x = d (mod 6). 3: 2 = 6 (mod 6) and 3: 6 = 
6 (mod 6), but 2 Æ 6 (mod 6). Further, 3x = 
1 (mod 6) has no solution x. 12 = 1, 2? = 4, 
32 = 3, 4? = 4, 52 = 1, 6? = 6; therefore the 
squares are 1, 3, 4, 6. 

4 ax= b (mod n) has a unique solution x if and 
only if a is invertible mod n. This is true if and 
only if a and n have no common factors other 
than 1. This is true for alla < nif and only if n 
is prime. 

5 ab=1 (mod n) means ab + pn = 1 + qn, 
which means that b steps of size a plus p steps of 
size n to the right followed by q steps of size n to 
the left results in one step to the right all told. 
Repeating the process one can take any number 
of steps to the right. To take a step to the left, 
take n — 1 steps to the right and a step of size n 
to the left. Conversely, if one can go from 1 to 2 
in steps of size a and n then, adding steps to the 
right and left separately, there exist natural num- 
bers p, p’, q, gq’ such that 1 + pa + pn = 2+ 
ga + q’n. Hence pa = 1 + qa (mod n). Choose x 
such that p =q + x (mod n). Then ax= 1 
(mod n), i.e. a is invertible mod n. Given a, n let 
d be the smallest natural number such that it is 
possible to go from 1 to 1 + d in steps of size a 
and n. Then it is possible to take steps of size d 
in either direction. Thus the points 1 + 2d, 
1 + 3d, .. . can all be reached from 1, and if any 
point between them could be reached then a 
point between 1 and 1 + d could be reached, 
contrary to assumption. Since 1 + aand 1 + n 
can be reached it follows that a and n are multi- 
ples of d. If c is any factor common to a and n 
then all steps can be broken into steps of size c, 
the step from 1 to 1 + d can be accomplished in 
steps of size c, hence c divides d. This proves: 
Theorem. a is invertible mod n if and only if the 
greatest common factor of a and n is 1. 


6 49 = 32 + 17,32 = 174+ 15,17 = 15+ 2, 
15=7:2 +1,1 15—7:2=15— 
707 — 15) = 8:15 — 7-17 = 8(32 — 17) — 
7°17 = 8-32 — 15-17 = 8-32 — 15(49 — 32) 
= 23-32 — 15- 49. Hence 49 : 15 = —1 + 
(32 - 23), 49-15-31 = —31 + 32: 23-31 = 
1 + 32(23 - 31 — 1), 49 - 465 = 1 + 32- 712. 
Similarly, 48-11 = 1 + 31-17,63-7=1+4+ 
40-11. 

7 Inother words, are 1953, 5115 without com- 
mon factors, or, in other words, can steps of 


I 


I 


size 1 be taken using steps of size 1953, 5115? 
From the equations 5115 = 2 - 1953 + 1209, 
1953 = 1- 1209 + 744, 1209 = 1-744 + 465, 
744 = 1+ 465 + 279, 465 = 1 - 279 + 186, 
279 = 1-186 + 93, 186 = 2: 93 + 0 it follows 
that 93 divides 186, hence divides 279, hence 
divides 465, etc. Thus one is led to the reduction 
(1953/5115) = (93/93)(21/55) = 21/55. 

8 The (j + 1)st term in the expansion of 
(1 + I/n)"is(a: (n — 1): +: (n — j + Dj. 
This is less than 1/(j!) but approaches 1/(/!) as 
n— œ. The desired equation follows easily 
from this observation. It is easier to sum the 
series 1 + 1 + (1/2) + (1/6) + (1/24) +»: 
than to find the limit directly. The first 7 terms 
give three place accuracy. 


9 The interval between s, and s,41 contains 
the sequence and has length a,41 — 0. 


10 ‘Not convergent’ means there is an e such 
that for every N there is an n > N such that 
ldn — gn| > e. Applying this again gives an 
m > n such that lgm — gn) > €, from which 
Gm — qN = |m — an| + |an — qn) = 2e. Apply- 
ing it a third time gives a p > m such that 
dp — Gn = 3e. Repeating ad infinitum gives 
Gn — gn È pe for all p. Therefore g, is arbi- 
trarily large. 

11 (b+ Da=ba+a>b+a>b.lfa,b 
are positive rational numbers then there is a 
natural number d such that da, db are natural 
numbers. Hence there is a natural number n such 
that nda > db, na > b. By the binomial 
theorem (or by an elementary induction on n) 
(1 + a)” > 1 + ma. Thus (1 + a)" > b Gf 
a > 0, b > 0) for n sufficiently large. If |qi| < 1 
then either qı = Oor |qı| 7t = 1 + a fora > 0. 
In the latter case there is an n such that 
lqil~" > gz’, hence |qi|" < qe. In the former 
case |qi| < q2. 

12 (370) 

2379 


= (7 2) 3) G2) G2) (i) 
~ AL 27 \1 37 \1 2/0 2/7 \61 
= (16 39) (61) 
~ \16 39/ \6l 
which gives 1037/2379 = 17/39. Written as a 


continued fraction it is 1 over 2 plus 1 over 3 
plus 1 over 2 plus 1 over 2. 


13 Because the determinant of a product is 
the product of the determinants it follows that 
Pn—1Qn — Pnqn—1 = (—1)”. If pa and qn are 
both divisible by c then c divides (—1)” and 


Answers to Exercises | pages 386-387 495 


therefore c = +1, i.e. Pa/qn iS in lowest terms. 
The equations 


(JG DE) 
rn—l ~U an rn 
(2) — (Pr! Pn) (=+) 
ro Gn—1 dn Fn 


and hence the desired equation if rz41 = 0. If 
ri/ro = mı/mo where mı, mo are natural 
numbers, set c = morg!. Then cro = mo, 
cry = mı. Set m, = crp. Then m,—1 = @nm, + 
m,4+1. Thus all the numbers mo > mı > m2 > 
m3 > +> are natural numbers or zero. It 
follows then. that within mo steps one must 
arrive atm, = 0, rn = Q. If r, is never zero the 
assumption r;/ro = mı/mọo must be false, i.e. 
r,/ro is irrational. Dividing pp—1Gn — Pn9n—1 = 
(—1)" by qn—1qn gives the stated relation, from 
which (Pr/Gn) = (Pn/Qn) 7 (Pn—1/Gn—1) + 
(Pn—1/Qn—1) — (Pn—2/Gn—2) + sts + 
(p1/q1) — (Po/go) + (Po/go) = 9 + (1/4041) — 
(1/qig2) + (1/qeq3) — C/q3qs) +- + 
(—1)"C1/qn—14n). Since Gndn41 = Gn(Gn—1 + 
Qn+19n) > Gndn—1 the convergence of p,/gn 
follows by the alternating series test (Exercise 9) 
and it remains only to show that the limit is 
ri/ro. Since (r1/ro) = (Pa—1lnt1 + Pnln)/ 
(Qn—1'n+-1 + Qnfn) it follows that ri/ro lies 
between Prp—1/Gn—1 and pp/gn for all n and 
hence that p,/gGn — r1/Tro. 


imply 


14 Since p/q lies between p,—1/g,—1 and 
Pn/Qn = (Pn—1/9n—1) + (1/Gn—19n) multiplica- 
tion by gn—1 shows that (pgn—1)/q lies between 
an integer and an integer +(1/q,). This clearly 
implies that the fraction (pgn—1)/gq has a de- 
nominator larger than gn. 


15 It is easily shown that past = qn, Qn+1 = 
Pn + Gn. Hence Pn4i/Qn41 = Gn/(Pn + Gn) = 
[((Pn/gn) + 1]7!. Passing to the limit as n — œ 
gives x = (x + 1)~! and hence the desired 
equation. By the quadratic formula ri/ro = 
4(\/5 — 1). By the margin diagram, the chord 
subtended by an angle of 7/5 satisfies 1:x = 
x + 1:1. To compute this number, square the 


matrix ¢ 1) severa] times until gngn—1 > 
1,000. Squaring the matrix four times gives its 


610987) from which it 
follows that 610/987 represents the limit with 
about six place accuracy. Hence it is .618 to 
three places. 


sixteenth power ( 


16 2sin (1/720) = 2[(r/720) — (1/6)(1/720)3 
+ +++] = (107/607) — (1/3)(@/720)3 + --- 
The second term is much smaller than 6073 
and all terms other than the first can be ignored. 
Since 107 = 31.4159... this term can be 
written 31 - 60-7 + [.4159...J60-2 = 31 - 
6072 + [24.95 ...]60-3, which gives Ptolemy’s 
result. 

17 (a) Multiplying by 10” this becomes the 
statement that for every real number x there is 
an integer i such that |x — i| < 1. This is 
obviously true. (b) Two integers which lie 
within 1 of the same real number lie within 2 
of each other and hence differ by at most +1. 


18 The error is at most half of 607°, which is 
just short of eleven decimal places. 


19 (a) Let r, be the largest decimal fraction 
less than r. Then the sequence {r,} and the real 
number r lie in the interval {r, < x <r, + 
10-"} for all n, hence {r,} converges and its 
limit isr. (b) Let g, be another sequence such 
that qn — r and such that q, is the first n places 
of an infinite decimal fraction. Assume the 
fraction is > 0. Then the sequence g, is non- 
decreasing. Hence qn is < r with equal only if 
Qn = Qn+i = GQn42 =°°*. Since gn, + 107" > gn 
for m > n it follows that g, + 107" > r and 
Gn is the greatest n place decimal less than r, 
hence qn = rn (all n), unless r is a decimal 
fraction. 


$9.2 pages 386-387 


1 Polynomial functions whose coefficients are 
natural numbers. 


2 Implicit differentiation shows that the 
derivative of log x is x~!. Since log1 = 0, 
the first equation is the Fundamental Theorem 
applied to F(x) = log x. The second equation 
follows from the fact that d/dh of x* = efloez 
ath = 0 is log x, ie. lim pn% [xl — x®]/ 
(1/n) = log x. 

3 Arcsin 0 = 0 and dx = cosydy, cosy = 
/1 — sin?y (cosine is >0 for —(r/2) < 
y < (rt/2)) gives dy/dx = (1 — x?)-!/2 and 
therefore Arcsin x = fo (1 — ¢*)7!/? dt for 
x| < 1. As y varies from —7/2 to 7/2, x = 
tan y varies from —* to %. Since dx = 
(1 + x?) dy and 0 = tan O it follows that 
Arctan x = fọ (1 + x?)~! dx for all x. 

4 Let (x(, y()) be a differentiable closed 
curve which does not pass through (0, 0). 
Without loss of generality one may assume that 
the range of ¢ is 0 < ¢ < 1. Choose (ro, 0v) 


Answers to Exercises | pages 391-392 496 


such that (x(0), y(0)) = (ro cos ĝo, ro sin ĝo). 
Define r(t) = v x(t)? + y(t)? and (A by 
OCA = Bo + fé dé where d0 denotes the pullback 
of (x dy — y dx)/(x? + y’). Then (x(d), yD) = 
(r(t) cos @(t), r(t) sin 6(2)) because they are 
equal for t = 0 and have the same derivative for 
all ¢. Since (x(1), y(1)) = (x(0), y(0)) it follows 
that cos 0(1) = cos 6(0), sin 6(1) = sin @(0). 
This implies 6(1) differs from 6(0) by a multiple 
of 2r. 

5 Ifz=a,w" +--+: then dz = [na,w*-! + 
(n — l)a,_yw"~? + ++ -] dw. Now dz = dx + 
i dy, dw = du + i dv. The expression in paren- 
theses can be expanded algebraically and put in 
the form p(u, v) + ig(u, v). Then dx = p(u, v) 
du — q(u,v) dv, dy = q(u, v) du + plu, v) dv, 
dx dy = (p? + q*) du dv and dx dy # 0 except 
when p = q = 0. 


6 Since e~!" cos xt < e™ for t > 1 the inte- 
gral converges for all x by comparison with 

et dt = e™t. It is largest when x = 0 be- 
cause then cos xt = 1 and e~*’ is undiminished 
when multiplied by cos xt. Since cos (—xt) = 
cos xt it follows that f(—x) = f(x). If x is very 
large then the alternating series test shows that 
f(x) is at most the area of the central bump of 
the curve y = e~ cos xt, which is small. Thus 
Mmo f(x) = 0. 


$9.3 pages 391-392 


1 The triangle inequality suffices to prove 
boundedness. Since f(x + A) — f(x) is a poly- 
nomial in x1, X2,...,Xn, Ai, h2,..., hn in 
which every term contains an A, it follows that 
f(x + A) SOl lhl [Pal + [hol [Pel + 
+++ + |A,||P,| where P; is a polynomial in 2n 
variables. Since |P;| < K: whenever |x| < K, 
|h| < 1 it follows that | f(x + A) — f (x)| can 
be made small by making |4| = max ({Ail, |Aal, 
a, Am) small. 

2 s—!Uf(« + sh) — f(X)] is a polynomial in 
Xi, X2,..., Xn, hi, ho,...,hn, S, hence it is 
uniformly continuous by Exercise 1. 


3 1 = 0%, [U/mt — (G - Dnt] = 
Erain tt — jt + 473 — 67? + 4j — 1] which 
differs from the proposed approximation by 
n? 7-1 [6(j/n)? — 4(j/n)n-! — n-?). The 
terms of the sum are easily less than 11 and 
there are n of them, hence 11/n is a bound on the 
difference. More generally, (x + A)* — xt = 
4x3h + h?[6x2 + 4xh + h?] which shows that 
(x + A)* — x? differs from 4x%h by at most 
114? when 0 < x < 1, |A| < 1. Thus over an 


interval {x; — k < x < x; + A} the difference 
(x; + A)* — (x; ot = [x + At Xi] - 
[(x; — k)* — xt] differs from 4x3(Ax) = 
4x3[h + k] by at most 22(Ax)?. Summing over 
all intervals of a subdivision of {0 < x < 1} 
shows that 14 — 04 differs from }_ (æ) by at 
most }°22(Ax)? < }°22(Ax)e = 22e provided 
all Ax’s are less than e. Thus >| (a) can be made 
to differ arbitrarily little from 1 by making lal 
small. 


4 Given e, choose 6 such that |A-![ f(x + A) — 
f(x)] — F'O < e when |A| < 6,0< x <1. 
Then on any interval {x; — k < x <x; + h} 
the change in f, f(x; + A) — f(x; — k), differs 
from f’(x;)(Ax) = f’(x;)(A + k) by at most eh + 
ek = eAx. Summing shows that f(1) — f (0) 
differs from È (œ) = > f'(x) Ax; by at most 
©, Ax = e. 

5 (x + A) — x7! = —h/x(x + h). If 
|x| > a and |A| < a/2 this is less than 2|h|a~? 
in absolute value, hence it can be made small by 
making |A| small. On the other hand, given any 
ô > 0 it is easy to find numbers x, x’ such that 
Ix — x'| < 6 but |x} — (x)! > 1. 

6 The fact that the derivative of e” at x = 0 
is 1 means that given e > 0 there is a 6 such that 
(e — 1)/h differs from 1 by less than e when- 
ever |A| < ô. Hence |e — 1| < (1 + ©)A}. Since 
lett? — et| = ete” — 1|, to prove e” is uniformly 
continuous on {|x| < K} it suffices to prove 
that e” is bounded on {|x| < K}. From 
the power series e? = 1 + x + (x?/2) + 
s+ + + (x"/n!) +--+ + it follows easily that 
le*| < eX when |x| < K. Since (e*t? — e*)/h = 
e*(e” — 1)/h and since e” is uniformly contin- 
uous, to prove that e” is uniformly differentia- 
ble it suffices to prove that (e — 1)/A is uni- 
formly continuous for 0 < |A| < 1. Let f(A) 
denote this function (eè — 1)/h. Given e > 0 
there is a 6 > 0 such that |f (h) — 1| < €/2 
whenever |A| < ô. It is easily shown that f(A) is 
uniformly continuous on {6/2 < || < 1}; 
hence there is a ôo such that if |4| and |A’| both 
lie between 6/2 and 1, and if |4 — h’| < ôo, 
then | f(A) — f(h’)| < e. Let 61 = min (ôo, 6/2). 
Then |4| < 1, |W] < 1, |h — k'| < 81 imply 
I f(k) — f(h')| < e. Since e was arbitrary the 
result follows. 


7 Because sine, cosine are perodic [sin (x 
+ 2r) = sin x, cos (x + 2r) = cos x] it suffices 
to prove uniform differentiability on the interval 
{|x| < r}. 

8 et !e 19 is uniformly continuous and agrees 
with 10% whenever x is rational. 


Answers to Exercises | pages 404—407 497 


9 A Tf + A) — fol = fd A“ [cos 
(x + A)t — cos xt] dt. For small A the integrand 
differs by less than e (in fact less than et) from 
t sin xt, hence the integral differs by less than e 
from fd tsinxtdt. This is a uniformly con- 
tinuous function of x and the result follows. 


10 |x|? = eP loeizl by definition. Prove that 
log |x| = Ji"! (dt/f is uniformly continuous on 
{6/2 < |x| < K} for any 6, hence |x|? is. 
Following 6 it suffices to show then that 
limz_,0 |x|? = 0, which is easy. 


11 For x ¥ 0 the limit exists because f(x) is 
the product of the differentiable functions 
x? and sin (1/x?). For x = 0 the limit is 
lim, o A~!{h? sin (A~?)]. Since |sin (A473) < 1 
this number has absolute value < |A| and the 
limit is 0. For x ¥ 0 the derivative is 2x sin x~? 
— 2x7! cos x~? which does not approach 0 as 
x— 0. 


12 For any «e, |A7'(f(x + A) — fœ) - 
f'(~)| < e for h sufficiently small. If e < 6 this 
implies Aif (x + h) — f(x) > 0, hence if 
h > Oit implies f(x + h) > f(x) so f(x) is not 
a maximum. 


13 Same as 12 
14 See §2.3 


15 Uniformly continuous means that for 
every « > there is a 6 > 0 such that | f(n—!) — 
f(m—)| < e whenever |n~! — m-1| < 6. If 
N > 26-! then n, m > N implies |n! — 
m—"| < ô, hence the sequence f(n—!) is con- 
vergent. Conversely, if f(n~!) is convergent 
then | f(n—!) — f(m—!)| < e for n m > N. 
Take 6 = (N — 1)~! — N7!. Then |[n7! — 
m~"| < 6 implies n, m > N and uniform con- 
tinuity follows. 


16 AHF + A) — FO) = hoe, 
y + h) — f(x, y)) dx. When h is sufficiently 
small the integrand differs by at most e from 
Of/dy for all x, y, hence the integral differs by 
at most e(b — a) from J? (Of/dy) dx for all y. 
It follows that F is uniformly differentiable. 


§9.4 pages 398-399 


1 Let P; be the midpoint of C: If j >i 
then |P; — P,| < 271107? because P; lies 
in C;. Therefore the sequence Po, Pi, Po,... 
converges. Let P, be its limit. Then passing to 
the limit as j — © gives |P; — P,| < 271107. 
Hence P,, lies in C; for all sufficiently large i. 
Therefore P, lies in all C;. If O also lies in C; 


for all i then |Q — P,| < 10~* for all i, hence 
Q = Po. 

2 Let U; = {all points not in C;}. If no point 
lies in all the C; then every point of Co lies in 
at least one of the U;, hence U; is an interior 
cover of Co. Thus a finite number of the U; 
cover Co. Thus there is a finite number of the 
C’s such that no point lies in all of them. This 
contradicts the assumption that the C’s are 
nested unless the assumption that no point 
lies in all C; is false. Hence some point lies in 
all the C;. 


3 It suffices to show that if x», x2, x). 

is an infinite sequence of points of X; then there 
is an x‘) in X; satisfying the condition of the 
Bolzano-Weierstrass Theorem. Since the se- 
quence x is bounded it follows from the 
Bolzano-Weierstrass Theorem applied to a 
cube {|x| < K} that there is a point x and it 
suffices to show that x‘ is in Xs. Let U; be all 
points whose distance from x‘ is strictly 
greater than 6+ /~!. Then UpCUiC::: 
and for each j there is a point of X not in U;; 
hence no finite number of the U’s cover X. 
Hence by Heine-Borel the U’s do not cover X 
and some point of X lies in no U. Hence 
x) is in Xp. 

4 Let y, y,... be an infinite sequence 
of points in f(X). It suffices to show that there 
is a point y as in Bolzano-Weierstrass. For 
each i let x be a point of X such that f(x) = 
y, Let x) be as in Bolzano-Weierstrass, 
and set py) = f(x). Given e > 0 there is, 
because f is continuous, a 6 > 0 such that 
Ix’ — x | < 6 implies |f (x) — f(x) < e. 
By assumption there are an infinite number of 
x such that |x — x]! < ô hence an 
infinite number of y such that |yO — y| < e. 
5 Combine Theorem 3 with Exercise 16, §9.3. 


$9.5 pages 404—407 


1 Ifa, = x"/n! then |a@n43/an| = |x|/(n + 
1) — 0 and the ratio test implies convergence. 
2 If |x| > 1 then |x|” > 1 and the definition 
of convergence is not fulfilled (take n = m + 1). 
Since (1 — 91 +x + x? te: - 4+ x” = 
1 — x"t+! and since x"+! — 0 the desired result 
follows as n — ©, because A(lim x,) = 
lim (Ax,) for any constant A. 

3 If{x| > 1 then the factors do not approach 1 
and the definition of convergence is not fulfilled 
(take n = m + 1). The identities (1 — x) X 
dI+x=1-—x*,d—xd+x +x?) = 


Answers to Exercises | pages 404-407 498 


1—x4,d ol + xd + x?) + x4) = 
1 — x8, etc., show that 1 — x times the infinite 
product is 1. 


4 77/6 = 1.64493.... To obtain this ac- 
curacy it would suffice to take the sum of the 
first 100,000 terms. Needless to say, this is not 
the method by which Euler estimated 1 + 
b+ dt to. 

5 Set x = 7/2 and use 1 — (2n)? = 
(2n — 1)(2n + 1)(2n)~?. 

6 The suggested method shows that for every 
e€ > 0 there is an M such that the sum of the 
first m(k + 1) terms differs from log k by less 
than e/2 whenever m > M. The sums between 
the m(k + 1)st and the (m + 1)(k + 1)st differ 
from the m(k + 1)st by at most (1/(mk + 1)) + 
(1/(mk + D) + +++ + C/(mk + k)) < (k/mk) 
= (1/m) which is less than ¢/2 for m sufficiently 
large. 


7 Add positive terms until a total greater than 
10 is reached, then add negative terms until a 
total less than 10 is reached (one term will do it), 
then add positive terms until a total greater 
than 10 is reached, then add negative terms, etc. 
It is easily shown that all negative (and positive) 
terms are eventually used and that the sum 
approaches 10. 


8 Given e>0 use the convergence of 
> =n lanl to find an N such that the sum of any 
finite collection of |a,| for which n > Nis <e. 
It follows that the sum of the first M terms of 
any rearrangement of >a, differs from ai + 
a+ +- + ay by at most e provided only 
that M is large enough to include all the terms 
a1, a2,...,an. Thus the sum of the first M 
terms differs from >_*_; a, by at most 2e. 


9 Let $p, be the series of positive terms 
obtained by striking all negative terms from 
San, and let dg, be the series of negative 
terms obtained by striking the positive terms. 
Either `p, diverges or > qn diverges since 
otherwise > a, converges absolutely. Thus 
both $ pa, >-gn must diverge, since otherwise 
> an would diverge. Add terms p, until 10 is 
passed, then add terms g, until 10 is passed, 
etc. This gives a rearrangement of >a, con- 
verging to 10. 

10 Ties di — iz: ail = yS a;| 
|Qm41Am+2°**An — 1| < [P| eforn> m> N. 
Therefore the sequence x, = [[?_, a; is con- 
vergent. 

11 Since |1 — a,| is small for large i (by con- 
vergence) log a;is defined for large i. Then 


Am+14m4+2°'*@n is near 1 if and only if 
log (Qm4+14m+2°°**@n) is near log 1 (by con- 
tinuity of log). 

12 If [[(i + b) converges then its value is a 
bound on the increasing sequence b1, bı + b2, 
bı + b2 + b3, ... which therefore converges 
(Exercise 10, §9.1). Similarly, if >> b; converges 
then (1 + bı), A + bD + be), .. . is in- 
creasing and bounded. 


13 A number is near 1 if and only if its inverse 
is near 1. Hence a7} am- + ap’ is near 1 if 
and only if @m41@m42°** an is. Since ([[7-1 a) 
({[¢-1 a7!) = 1 the desired formula follows by 
letting n > œ. 


14 By the method of 12 it follows that 
[[d — b,~! converges if and only if [[(1 + bà 
converges. Then apply 13. 

15 cos x is zero for x = +7/2, +37/2,... 
and is 1 when x = 0 which leads to the con- 
jecture that cos x = (1 — (2x/n)2)Q1 — 
(2x/3r)?) +» . Since cos x = 1 — (x?/ D+, 
equating the terms in x? gives —(1/2) = 
—(4/T?) — (4/99?) — (4/25) — +--+: , 
(m?/8) = 1 + (1/9) + A/25)+ -+ . On the 
other hand 1 + (1/9) + (1/25) +: = 
a + 0/9 + 0/9) +::9)-— 0/4 a + 
(1/4) + (1/9) +) = @?/6) — (1/4) x 
(17/6) = (17/8). 

16 Using 12-14, in order to prove that 
X (1/p) diverges, it suffices to show that 
IJG — p~)7~! diverges. For this it suffices to 
show that this product can be made arbitrarily 
large by including enough terms. Let N be a 
given integer. For each prime p let Sp = 1 + 
pt + p7? +-+- + p™ where p* is the largest 
power of p less than N. Then Sp < (1 — p~4)7! 
and S, = 1 for all primes p > N. Thus [[S, is a 
finite product. It is easily seen that []S, > 
1+ 1/2+ 1/3 +-::+1/(N — 1). Since the 
series >. (1/n) diverges this can be made large, 
hence [[(1 — p~')~! can be made large. 


17 Setu = tyr. Then [%, exp (~rt?) dt = 
mw t/2 f2, exp (—u?) du = 1. 

18 Because the integrand is not defined at 
x = 0, hence the domain of integration must be 
taken to be the non-compact set {0 < x < 1}. 
Since lim.9 fi x7* dx = lim.,o [x!~¢/ 
(1 — a] |e = A/A — a) — limo (e17*/ 
(1 — a)) it exists and is (1 — a)~! provided 
a<l. 

19 Set x = u, y = wo. Then the pullback of 
exp —(x2 + y?) dx dy is exp (—u7(1 + v?) 
ududv. Writing the double integral as an 


Answers to Exercises | pages 419-426 499 


iterated integral and using fọ exp (—u2k) udu = 
(1/2k) and f (1 + vT! dv = w/2 gives the 
value 7/4. 


$9.6 pages 419-426 


1 ag+tead-nW+eAt—- 6b+t-eee t+ 
(—17)") = 1 — (—12?)"*! hence (1 + t351 = 
1 — t? + t4 — -+ (=t? + ((—12)"4+1/ 
(1 + £ĉ). The last term is uniformly small for 
0 < t< x < 1 when a is large, from which 
Jo ( + e dt = Eh- fo (PY dt = 

7-0 (— Iy"GeenF/Qn + 1)). Thus the series 
1+0-—-4+0+4+0-—4+4--:-: is Abel 
summable “and its Abel sum is lim., fo A + 
t?)-1 dt = Jo a +7 ' dt = 4/4 (see §7.5). 
Since 1 — 4 + Ł — 4 + +- + is convergent, its 
sum is 7/4 by Abel’ S Theorem. 


2 (t/a) 2 e—(y?/4a)_ 


3 The given integral is equal to (1/2) [%. 
(sin (x + 1)y/y) dy — (1/2) fZe (sin (x — 1)y/ 
y) dy which is 0 if |x| > 1, 7 if |x| < land 7/2 if 
x| = 1. 


4 Let fy(x) denote the sum of the first N terms. 
Then for M > N, | f(x) — fux) < 2-4) + 
2—(N +2) feces 2%S+4+4+ +) —2-N 
Thus f(x) = limy_,. fv(x) exists and satisfies 
If) — fv@)| < 27 for all x. The function 
Jn(x) is the sum of a finite number of uniformly 
continuous functions and can therefore be 
shown to be uniformly continuous. Given 
€ > 0 choose N so large that 2-" < ¢/3 and 
choose 6 so small that |x’ — x| < 6 implies 
lfn(x’) — fv(x)| < €/3. Then |x’ — x| < ô 
implies | f(x’) — f(x)| < e so f is uniformly 
continuous. 

5 Set fy(x) = u(x) +--+ + u(x). Then 
IN — fxQO)| < Mupi + °°: + Mx for all 
x. Since >) M, converges it follows easily that 
limy _,« /v(x) = f(x) exists and that for every 
e > 0 there is an N such that |fn(x) —f(x)| < 
€/3 for all x. Since each fy(x) is uniformly con- 
tinuous, the uniform continuity of f (x) follows 
from |f(x’) — fO < |f@) — fr@)| + 
fn’) — ANOA + lfv) — fd] < (€/3) + 
(€/3) + (€/3) as in 4. 

6 If > a,x" converges then a,x" — 0asn— ©, 
from which it is easily shown that there is a K 
such that |a,x”| < K for all n. Set p = |X| — 
(e/2), and M, = lanlo” < K\p/x|". Since 
|p/x| < 1 the series >| M,, converges. 


7 (= xx? x 4 xt—-- H H 
2 að path) C D 


d—14+1)x?+d—-14+1-—-—1)x?+---= 
P+x274+x¢4+---, 

8 As in 6 there is a Kı such that |a,x"| < Kı 
for all n and a Ke such that |b,X%"| < Ko. Let r 
be slightly less than 1, and set Mp m = 
KıKər”t™, For |x| < rx it follows that 
|anDmx”x™| < Mnm. The double series 
2 n,m Mim = 2 z0 [K1 Ker") n-o r) = 
Dineo Kiko — rr = KiK — r)? 
converges absolutely. Therefore $ a,5,,x”x™ can 
be summed either in the order (*) or in the order 
2or=0 anx” n=0 Omx™] = B(x)LR0 anx” = 
g(x) f (x). 


9 Setting A = matn trt i in log (1 + y) = 
y— ły? +4 tyt +- - and summing over 
n = 1, 2, 3, gives log (sin x/x) = a2x? + 


a4x* + agx® 4 «+ e where ao, = —n7 lr?" X 
[1 + 272r + 3-2" 4 472r 4 ..-] On the other 
hand, the equation (2a2x + 4a4x? + 6agx® + 
-Dix — (1/6)x* + (1/120)x® — ++ -] = 
(x — (1/2)x3 + (1/24)x® — (1/720)x7 + +>) 
— (x — (1/6)x? + (1/120)x>5 — (1/5040) x? + 
- +) yields a2 = —1/6, a4 = —1/180, ag = 
— (1/2835). This verifies the first two formulas 
and indicates that 1 + 276 + 376 + 4-64... 
= 79/945, 
10 (a) If y is an integer the expression can be 
written as (y + 1)(y + 2):--(y + N) divided 
by NIN’. Taking N very large and cancelling 
(vy + DO + 2):::N from numerator and de- 
nominator leaves (N + 1)(N + 2)-:--(N+ y) 
in the numerator and y!N” in the denominator. 
Hence it is (1/y!) times (1 + (1/N)Q + 
(2/N))-:: (1 + (y/N)), and the second factor 
approaches 1 as N — œ. (b) 0! should be 1 
and taking y = O in (a) gives this answer. 
(c) All steps are easy. The inequality follows 


from the formula log (1 + x) — x= —$x? + 
4x3 — +++ which gives for xl <4 2 the desired 
equality wih K=3+4+$4+-:°=1. 


(d) ry/[[O)[](—y) is the limit as N— © of 
wy] lr- A + (v/n)) NY Tne  — (y/n)) NY = 
myl [A (1 — (y?2/n?)) which is sin ry. Using 
N — 1 in the formula for [[(y) shows that 
yllo — 1I [(y) is the limit as N— © of 
ye & + DY + 2)- ‘V¥+N-—1)-17!- 

2- “(N= 1) IN- D=: Jee ee 
Ny- OHI oN 1 yv- l 
= (1 — (1/N)) which is 1. (e) From (d) 
sin (m/) = r/[[(—1/D[[(—1/2). Since 
I[(—1/2) > 0 this gives [[(—1/2) = vr. 
Thus Į [ (n — (1/2)) = ((2n — 1)/2)[ Tm — 8/2) 
= +++ =((2n — 1)/2)((2n — 3)/2) +++ (3/2) X 
(1/2) and] (—n — (1/2)) = (—2)(—2/3) X 


Answers to Exercises | pages 446-447 500 


(—2/5) +++ (—2/(2n — 1))Vr. (f) At the in- 
tegers —5, —4,...,4, 5 the value of the func- 
tion is 0, 0, 0, 0, 0, 1, 1, 1/2, 1/6, 1/24, 1/120. 
The values at —3, 4, 15, 25, 33, 45, can be 
guessed (approximately) by filling in a smooth 
curve through these points, and verified by (e). 
The oscillations of 1/[ [(y) between the negative 
integers increase greatly as y— —o. (E.g. 
1/T](—43) ~ 3.7.) (g) 1/[](2x) is the limit 
as No of (1 + 2x)/1I)((2 + 2x)/2) X 
((3 + 2x)/3)+++((2N + 2x)/2N) X (2N)~*, 
while 1/[[(x) is the limit of (1 + x)/1)((2 + 
x)/2)+++((N + x)/N)N~ and 1/]](x — 3) is 
the limit of ((1 + 2x)/2)(3 + 2x)/4)::: 
((2N — 1 + 2x)/2N)N—*+!/2, Thus the ex- 
pression in question is the limit of an expression 
which does not involve x. The value of the limit 
is [[(—1/2) = vr. (h) Similar to (g). Use 
N in the expressions of [[(x), [Į (x — 1/n),... 
and nN in the expression of [[(nx). (i) un = 
II(-1/m[[(—2/n) --- I] (a — 1)/n). By (d) 
JIo — DIT(—y) = r/sin (ry). Setting y = 1/n, 
2/n,...,(n — 1)/n and multiplying gives 


u? = w*—!/(sin (r/n) sin (2r/n): +: 

sin ((n — 1yr/n)) 
we = T, pł = 4r?/3, u3 = 203, ug = 24r5/3 
which leads to the guess u2 = (2r)"~!/n. 
(j) For n odd this is essentially the problem of 
finding the coefficient of y? in the expression of 
sin pA as a polynomial in y = sin 4. The 
formula 


(i) +B) +--+ () 
~al@) O 


is used. (k) In the formulas obtained by 
differentiating (5’) set a= 1 and cancel 
minus signs to obtain J[][((2n — 1)/2) = 
fo (u?) e7"? du. Write this as twice the integral 
from 0 to © and substitute ¢ = u*. (D The 
integral is proper at ¢ = 0. Since ef > t”/n! for 
all t > 0 and all n it follows (using n + 2 in 
place of n) that te~! < const. t7?. Since 
fi t? dt converges so does J,” t”e~™t dt. The de- 
sired formula follows from fo” t*e~‘ dt = 
limx s» Jo {(d/dt)(—t"e—!) + ni*—1e-} dt = 
nfo” t®~1e-* dt and h° e-' dt = 1. (m) The 
integral converges as t — © for all x. Ast — 0 
‘the term e~ is like 1 and the integral is like 


Jd t dt. By Exercise 18 of §9.5 this converges 
for x > —1. (n) Set [[(nx) = Tnx + 1), 
I[@) = r(x + 1), etc. and y = nx + 1 to ob- 
tain (y/n) (O+ 1)/n)  T(O +n — 1)/n) = 
nh!2)—a( Daye DT (y), 

11 (a) The formula (*) of 10 (b) The inte- 
gral converges near 0 if x > —1 and near 1 if 
y > —1, hence converges if both x > —1 and 
y > —1. (c) Setu = 1 — t. Then du = —dt 
and the orientation is reversed. fg rdt = 
(x+ 1)~!. (d) Elementary. Note y > 0. 
(e) C(x, n) = (n/(x + n))((n — 1)/(x +n — 1)) 
“3 A/&œ +1):1. (f) Ck, n) = A + 
(x + 1)/n) fg u*(1 — (u/n))” du. For very large n 
the factor in front is nearly 1, the domain of in- 
tegration is nearly {0 < u < œ} and the inte- 
grand is nearly ute“. (g) Let f,(u) be the 
function which is (1 — (u/n)) for u < n and 
which is 0 for u > n. It suffices to prove then 
that lim, so fo’ fa(u) du = Jo” [lima + fn(u)] du. 
(h) Elementary (i) B(x, y) = [| (x — 1) X 
Ho -DII +y- 1) = POOTO)/Te +») 
12 (a) Both are 2r, mr’, 4rr?/3. (b) w = 
r—"(x1 dx2 dx3°°*dxtn — x2 dx) dx3°°+dxn + 
eee + (—1) tx, dxı dx2 +++ dXn—1]. w is 
closed (Ex. 4, §8.3). The integral of w over r = 
const. is r~” times the integral of xı dx2--- dXn 
— +++which is the integral of n dxı dx2'*' dXn 
over the interior which is noar” where opn is the 
‘volume’ of the unit sphere. Thus r”? = 
Non Joe rl dr = (n/2)on Jo’ eru"! du 
= onl [(n/2). 

13 Since y + y~! hasaminimumat y = 1 its 
inverse (y + y—!)—! has a maximum at y = 1. 
Thus lim,.1 (d/dy)[y/(1 + yJ] = 0. For 
|y| < 1 this derivative can be written as the sum 
of the series (d/dy)[y — y? + y5 — y" +: ] = 
1 — 3y? + 5y4 —7y8 +- = xali — 3x? + 
5x3 — 7x*+.---]wherex = y?. Thus lim,_,1 of 
[x — 3x? +] = 0. 

14 (a) Sy is 1 or 0 depending on whether N is 
odd or even. (b) Let S, be the sum of the 
series, and x, = Sn — So. Then x, — 0. Given 
€ > 0 choose M so large that |x,| < €/2 for 
n > M. Then for N > M, N`t[xi + x2 + 
oe xy) < (xl/N) + + + (leml/N) + 
(e/2((N — M)/N) —> (€/2) as N — œ. Thus 
NT!ixi + x2 +: + x] —> 0, N7'[S1 + 
S27: + Sy] > So. 


$9.7 pages 446—447 


1 See the proof in §2.3 that an integral con- 
verges if and only if U(S) — 0 as |S| — 0. This 


Answers to Exercises | pages 452-455 501 


implies that if f, g are Riemann integrable then 
f+ g, cf, |f| and max (f, g) are Riemann in- 
tegrable. 


2 Givene > Olet V bea collection of intervals 
of total length less than €/2 which contain all 
points where f(x) = g(x) and W a collection of 
total length less than ¢/2 which contain all 
points where g(x) = A(x). Then V and W to- 
gether give a collection of total length less than e 
containing all points where f(x) = A(x). 

3 Given e > 0 choose a collection of intervals 
V, of total length less than €/2”+! such that 
falx) = en(x) for x outside V, (n = 1, 2, 
3,...). Let Va be a collection of total length €/2 
such that f,(x) —> f.(x) for x outside Va. Then 
the collection of all intervals Vs, Vi, V2, ... has 
total length less than e and outside these in- 
tervals g,(x) = fa(x) (all n), limn 0 n(x) = 
LiMn 00 fn(X) = f(x). 


4 One example is 


0 ifx <0 
_ nx fOcx<1/n 
falx) = 2n — n?°x if 1/n < x < 2/n 
0 if 2/n < x. 


Then lim,_,. fr(x) = O for all x, but 
Jo fn(x) dx = 1 for all n. 

5 Set f (x) = n[x?* exp (—(a + n7!)x?) — 
x?k exp (—ax?)]. Then f,(x) lies between the 
integrable (see Ex. 10(m), §9.6) functions 
x?k exp (—ax?)(—x?) and x? exp (—ax?) X 
(—x? + x4). Since F < f < G implies |f| < 
max (|F|, |G|) the sequence fp is dominated by 
an integrable function, hence liM, „o J. fn dx 
= f2 (liM, «0 fn) dx, which gives the desired 
formula. 

6 Let fa(x) be 0 for {|x| < (1/n} and |x|~¢ 
elsewhere. Then f1, fa(x) dx = (a — 1)7! X 
(2n°-! — 2) ifa > 1 and 2 logn ifa = 1. In 
either case lim,.,.. does not exist. If |x|—* were 
integrable then this limit would have to exist 
(and be f1; |x|~ dx). 

7 Let A,(x, y) be 0 if x? + y? < (1/n) and 
r—* if x? + y? > (1/n). If m > n then the 
integral of Am — A, is the integral of r~* over 
the annulus {m—! < x? + y? < n71} which is 
Sfr r dr dð = 2n(2 — a)“ [n*-2 — m-?] 
which proves convergence for a < 2. The proof 
of divergence for a > 2 is similar to 6. 


8 Integrable if and only if a < 3 


9 Given e > 0, choose K so large that 
JEk F(x) dx differs from [%,, F(x) dx by less 
than €/3. Choose N so large that [x fa(x) dx 


differs from f£k f(x) dx by at most €/3 when- 
ever n > N (by the Dominated Convergence 
Theorem by proper integrals). Then {,. f(x) dx 
differs from {, fm(x) dx by less than e whenever 
n,m > N. Thus limpo Jo fn(x) dx exists. 
Moreover if K’ > K then [*z, f(x) dx differs 
from this limit by at most e and the theorem 
follows. 


10 fl fnlx) — falx)| dx = RU — 
fal dx = SË fa dx — S°f,(x) dx whenever 
m > n. By assumption f? fa(x) dx is a bounded 
increasing sequence. It therefore converges and 
SP fa — fa) dx < e for n, m > N. The 
result then follows from the completeness 
theorem. 


11 Since Riemann integrable functions are 
Lebesgue integrable this is a special case of the 
Dominated Convergence Theorem. 

12 Let g,(x) = limj;_,. min (fn, frais. +>, 
fn+j)- This limit exists for all x because it is a 
decreasing sequence, of non-negative numbers. 
Each g, is the dominated (by fn) limit of inte- 
grable functions, hence f? ga dx exists and is 
<K. The sequence gi, go,... is increasing, 
hence as in 10 it satisfies the Cauchy Criterion 
and there is an integrable gs such that f?|ge — 
8n| dx —> © and g,(x) > g(x) for almost all x. 
Since g,(x) — f(x) for almost all x, it follows 
that g.(x) = f(x) for almost all x and that 
SE f(x) dx = Vimy yo J? ga(x) dx < K. 


$9.8 pages 452-455 


1 Given e > 0 there is a 6 such that |y — 
f(X)| < 6 implies |g(v) — g(f(®))| < e. But 
there is a 61 such that |x — X| < 6, implies 
FOO — f Œ| < 6. Thus |x — x| < 6, implies 
SOD — (S E| < €. 

2 Given e > 0 there is a 6 > such that 
y, y' in Y and |y — y’| < 6 implies |g(y) — 
g(y’)| < e. But there is a 61 such that x, x’ in 
X and |x — x’| < 61 implies | f(x) — f (x)| < ô. 
Thus |x — x’| < 6, implies |g(f(x)) — 
g(f(x’))| < €. 

3 Let L: E — F denote the derivative of f at X 
and M: F — G denote the derivative of g at 
f(%). Then («AŒ + sh)) — g(f(%)))/s) — 
M(L(h)) = CE) + sy) — g(f()))/s) — 
M(L(A)) where y = (fŒ + sh) — f(%))/s). 
Since y — L(h) as S — 0, y is in a bounded set 
and for every e > 0 there is a ô such that 
s~'[g(f(®) + sy) — g(f(®)] differs from 
M()) by less than €/2 whenever |s| < 6. The 
continuity of M implies M(y) differs by less 


Answers to Exercises | pages 452-455 502 


than «/2 from M(L(hA)) for s small and the 
result follows. 


4 L,(ah) = lims_,o ((f(« + sah) — f(x))/s) = 
lims_,0 a(( f(x + sah) = f (x))/sa) =a lim:_,0 
(f(x + th) — f(x~))/) = aL,(h). Since the 
functions s~![f(x + wh + sh’ + tk) — 
f(x + uh)] and s~'[f(x + sh’) — f(x] are 
uniformly continuous in s, u, t, x, h, h’, k, their 
difference is uniformly continuous on the set 
s = u. Therefore it has an extension to s = t = 0 
which is the limit. The function is defined and 
independent of k for t = 0, hence the same is 
true of the limit. Take k = ¿(h + h’) and 
t = —%s. Adding and subtracting f(x) in 
each of the 4 terms shows that the limit is 
gh’) + Lz(—gh + gh’) = 0 by L,(ah) = aL,(h). 
Then setting ¢ = 0 and adding and subtracting 
f(x) gives L(h + h) — L(A) — Lit’) = 0 as 
desired. 

5 If the condition is fulfilled then the map is 
uniformly continuous because |L(x) — L(x’)| = 
|L(x — x’)| < Blx — x’| can be made <e by 
making |x — x’| < e/B. If the map is continuous 
then there is a 6 such that |x — 0| < 6 implies 
|L(x) — L(0)| < 1, i.e. |x| < 6 implies |L(x)| < 1. 
Then for any x’, |6x’/|x’|| = 6, |L(6x’/|x’|)| 
< 1, (6/|x’|) |L@’)| < 1, [LKD] < 67!|x’| and 
the condition is fulfilled with B = 6—!. 

6 Let 6; = (1,0,0,...,0), 62 = (0,1,0,..., 
0),..., 6, = (0,0,0,...,1) and let B be a 
number such that B/n > |L(6,| for i = 1, 
2,..., n. Then, for x = (x1, X2,..., Xn), 
L(x), = LO xsd] = |Z xi LD < Ded X 
|L(6;)| < B|), hence L is continuous. 

7 Elementary. RX R X::: X Ris R” with 
the norm |x|... 

8 If [xj. = 1 then |Mix + Moaxlo < 
[Mix|. + |Mex|. < max |Mıx| + max |Mox| = 
|Mi| + |Mo|, hence |Mı + M| < |Mi| + 
|Mo|. The other axioms are easily proved. If 
Mx = 2; dijXj then |M| = max; (la;1| + 
jaio] b= H lain). 

9 |M| is easily seen to be a norm. Let Mı, 
M2,... be a sequence of elements of L(E, F) 
which satisfies the Cauchy Criterion |M, — 
M,,| — 0. For each x in E, |M,x — Mmx| = 
(Mn — M,,)x| < |M, — Mal |x| — 0. There- 
fore M,x is a convergent sequence in F. 
Define Max to be its limit. Then Ms: E — F 
is a well-defined function. It is clearly linear, 
so it suffices to show that there is a B such 
that |M,,x| < B|x|. For this it suffices to choose 
B such that |M,| < B for all n. 


10 The Holder inequality (see §5.4) 


11 |M2(M,x)| = |Mix| |Meoy| where y = 
|M1x|—~!(M,x) so that |y| = 1. (If |Mix| = 0 
then M2ə(Mıx) = 0.) Thus |[Mo2(M,x)| < 
IMi] |Mo| for all x satisfying |x| = 1, and 
|IM2 o Mı| < |Mj,| |Mo| follows. 


12 Given M such that |ML — I| < 1, define 
a sequence M, in L(E, E) by Mo = M, M,41 = 
2M, — M,LM,. Then (I — M,41L) = U — 
M,L)? = P?"** where P = I — ML. Conse- 
quently Ma1 = M, + U — MDM, = 
(I + P”)M, = (I + P”) d + PAU + 
P*)\I + P)M. The existence of the limit 
Ma = liM,» Mn is easily proved by proving 
the convergence of the product (J + P?) -+> 
(I + P*)7 + P) or, what is the same, of the 
series I + P + P2 + P8 +- + P. Letting 
n— œ in M,L = I — P” gives M,L = I. An 
analogous argument shows that if there is an N 
such that |LN — I| < 1 then there is an Ne 
such that LN, = /. Finally, M, = Məl = 
MLN» = IN» = Nv, hence Lis invertible with 
inverse L7! = M» = No. 


13 Given Lı near L, let 6 = |L — Li|, p = 
[J — L-!Lil < IL-1{6, Mo = L7}, Ma1 = 
2M, — M„LıM,ı. Then, as in 12, the sequence 
M, converges to M, = Ly! and satisfies 
|Mn — Mo| = |(P + P? +--+ + P?)Mo| < 
(o/(1 — p)) |Mol, hence |Ly1 — L7| < 
(p/(1 — p)) |L—'|. This can be made small by 
making ô small, q.e.d. 


14 For any given y the sequence of succes- 
sive approximations is xo = X, Xn41 = Xn + 
M(y — f(xn)), ie. Ax = M(desired Ay). The 
proof of convergence turns on the inequality 
S — f — L — X)| < elx — xl, ie. 
[Ay — L(Ax)| < e |Ax| for x’, x near ¥. This 
can be proved by parameterizing the line 
segment from x to x’ by x(t) = x + I(x — x) 
on the interval 0 < t¢ < 1. By the Fundamental 
Theorem of Calculus Ay = fj (dy/dt) dt = 
Jo Lz (Ax) dt where the integral is the integral 
of a 1-form on {0 < ¢ < 1} with values in F. 
Given e> 0 there is a ô> such that 
Ix — X| < ô implies |L, — Lz| < e. Therefore 
\Lx(Ax) — L,Ax)| < «e |Ax|, hence |Ay — 
L3(Ax)| < ¢|Ax| as desired provided |x’ — x| < ô 
and |x — X| < 6. Let p < 1 be given. Choose 
ô such that |Ay — L3(Ax)| < p |M|~}|Ax| 
whenever x’, x are within 6 of Xx. Then for any 
y satisfying |M(y — ¥)| < (1 — p)ô the suc- 
cessive approximations satisfy xn41 — Xn = 
Xn — Xn—1 — M(f (xn) — f(Xn—1)) = Ax — 


Answers to Exercises | pages 452-455 503 


M(Ay) = M(L(Ax) — Ay) which gives |x,41 — 
Xn| < plxn — Xn—i| and hence |xn41 — X| = 
Xni — Xal fort + ix — xol < W + 
PT! +: + p+ DIMO — F) < 6 asin 
§7.1. Thus the sequence x, satisfies the Cauchy 
Criterion for all y sufficiently near y. Its limit 
satisfies x» = Xe + M(y — f(x,)), hence 
y = f (Xe). Set xo = g(y). For p small g(y) 
differs from xı = ¥ + M(y — 7) by a constant 
times p(1 — p) times |Ay|. This implies that g is 
differentiable with derivative M. For x near 
x the derivative L, is near Lz, hence by 13 the 
inverse M, = Ly} exists and is continuous. 
Therefore g is differentiable at f(x) with 
derivative M, by the proof above. Therefore g 
is continuous. Thus its derivative Lī is con- 
tinuous, q.e.d. 


15 Apply the Inverse Function theorem to 
the map f: Ei X E2 > F X Eə defined by 
f(x1, x2) = (f(r1, x2), x2). Alternatively, 
follow the proof of §7.1. 


16 Given a k-form A dx; . . . dx;, (in the 
sense of §4.2) and given a k-tuple (v1, v2,..., 
u,) of elements of R”, define the ‘value’ of the 
given k-form on the given k-tuple to be the 
coefficient of du; duo... du, in the pullback 
of the given k-form under the map (uj, 
U2,..., Uk) > Uy, + Ugve + °° + + Ugly, Of 
R* to R°”. 


17 L*(¢) for ọ in A;(E2, F) is the function 
which assigns to a k-tuple (v1, va, ..., Ux) of 
elements of E; the element ¢ (Lv, Lvg,... , Lux) 
of F. The Chain Rule is immediate from this 
definition. 


18 Sis a subset of E which can be described 
by a finite number of charts R* — E satisfying 
the conditions of Chap. 6. The approximating 
sums give elements of F, hence the limit is an 
element of F. 

19 By (i), |0- x| = 0, |—x| = |x|. Thus by 
(il) 0 = |x + (—x)| < |x| + [—x| = 2|x| and 
division by 2 gives |x| > 0 as desired. 

20 Let ô; be the element of R” which is 1 
in the ith coordinate and 0 in the others. Choose 
Bso that |6;| < Bfori = 1,2,...,n.Then |x| = 
\(x4, XQ, - 2 2 5 Xn)| = > xô; < 2 |x] B < 
nB max {|x,|;} = C|x|o where C = nB. Alge- 
braically the space of all linear maps ¢@: R” — R 
is the same as R”. Let |ġ|* be the norm defined 
by Exercise 9 and the given norm |x|, i.e. 
define |¢|* to be the least real number such that 
(x)| < |ọ|*|x| for all x in R”. Then |¢|* is a 
norm so, by the first part, there is a C such that 
lo|* < C ilo. Thus l| < C |d|./x| for all 
ġ, x. But |x]; is the least real number such that 
| < |d|o|x]1 for all @. Hence |x|; < C\xl, 
so |x| > C7! (xl, > const. |x|» as desired. This 
proves that if E is any finite dimensional Banach 
space and if R” > Eis any basis of E then a map 
f: E — F is continuous if and only if the cor- 
responding map R” — F is continuous in the 
norm |x|... Hence the last statement follows 
from Exercise 6. 


21 L is continuous by Ex. 5 because |Lx| = 
|x|. Let M be the operator ‘shift left’, i.e. the 
operator M(x1, X2, x3,...) = (X2, X3, x4,...). 
Then M is continuous because |Mx| < |x|. Also 
ML = I. However, L is not invertible because 
the equation Lx = (1, 0,0,...) has no solution 
xin E. 


index 


Abel summation 412, 426 (Ex. 13) 
Abel’s Theorem 411-412 
Absolute convergence 409, 439n 
Addition formulas 
for trigonometric functions 252 (Ex. 1(e)) 
for e” 253 (Ex. 2(b)) 
for general exponentials 256 (Ex. 10) 
Affine manifolds 129 
Affine maps 8, 86-87 
decomposed as a decomposition of simple maps 
10-11, 15 (Ex. 8), 104-105 (Ex. 3), 199, 
200 (Ex. 4) 
Algebra of forms, the 90n 
Algebraic functions 381-383 
Almost everywhere convergence 435 
Alternating series test 377 (Ex. 9) 
Analytic functions 299 
Approximating sums 25-26, 29, 197—198, 208-213, 
400, 426 
Archimedean laws 366, 373, 377 (Ex. 11) 
Arcsine function 140 (Ex. 8), 386 (Ex. 3) 
Arctan 146-147 
Arithmetic mean 184 (Ex. 6) 
Arithmetic modulo n, 376 (Ex. 3-5) 
Arzela’s Theorem 447 (Ex. 11) 
Atlas 203 


Banach spaces 448 
Basis of a vector space 115-117 
Betti numbers 327 (Ex. 10) 
Binomial coefficients 90n, 304, 424-425 (Ex. 11) 
Binomial series 304-306 
Bolzano-Weierstrass Theorem 393 
Boundary of a surface-with-boundary 215, 223 
(Ex. 1) 
Boundedness 
of domains 31 
of functions 31 


Cancellation on interior boundaries as the under- 


504 


lying idea of the Fundamental Theorem 59, 
65-66, 72 
Canonical form for linear maps 119 
Cauchy Convergence Criterion 456-457 
in definition of integrals 30 
in definition of k-dimensional volume 197 
in successive approximations 230 
for complex numbers 292, 310 (Ex. 2) 
in definition of real numbers 368 
Cauchy-Riemann equations 283, 296 
Cauchy’s Integral Formula 298 
Cauchy’s polygon 254 (Ex. 5) 
Cauchy’s Theorem 296 
Cesaro summation 426 (Ex. 14) 
Chain rule 143 
in proof of Stokes’ Theorem 62, 69-70, 74 
statement for affine maps 87 
proof for affine maps 87-90 
in matrix notation 101-102 
in Jacobian notation 102 
principal statement 143 
in terms of Jacobians 144 
proof 153-156 
reviewed 190 
for complex functions 311 (Ex. 10) 
Closed forms 63 (Ex. 8), 71-72 (Ex. 7-10), 274, 
320-326 
Compactness 
definition 392 
related theorems 393-398 
Completeness 
of real number system 373-375 
of Lebesgue integrable functions 442 
of Banach spaces 448 
Complex numbers 289-291 
Computational rules 
governing forms and pullbacks 11-13, 20, 
42-43, 86, 88-89, 142-143 
governing the computation of derived forms 60, 
68, 73, 220 


H.M. Edwards, Advanced Calculus: A Differential Forms Approach, Modern Birkhauser Classics, 


DOI 10.1007/978-0-8176-8412-9, © Harold M. Edwards 2014 


Index | 505 


Conditional convergence 410n Differentiation under the integral sign 392 
Conductivity 334 (Ex. 16), 399 (Ex. 5) 
Conformal coordinates 283-286 Directional derivatives 149 (Ex. 7) 
Confusion 26 Dirichlet problem 288 
Conjugate of a complex number 309 Div (divergence) 268 
Conservative force fields 64 (Ex. 9) Divergence of a flow 6l 
Constraints Divergence Theorem 73, 268 
examples 164-168 Dot product 172, 269 (Ex. 5) 
non-singular 168 Double series 407-409 
Constructive mathematics 463-466 
Content 197n Electrical forces (see Coulomb’s Law) 
Continued fractions 377-379 (Ex. 12-15) Electromagnetic field 343 
Continuity Elementary functions 384 
of a force field 24 Elimination 
of a function of two variables 31 of variables in linear equations 76-79 
of k-forms 143 Elimination Theorem 
of complex functions 293 statement 158 
main definition 390 in proof of Implicit Function Theorem 158 
of maps between Banach spaces 451 reviewed 191 
Continuity equation 332 proof 226-232 
Convergence (see Cauchy Convergence Criterion) for rational numbers 365 
Coulomb’s Law 335 Envelopes 194 (Ex. 7) 
Cramer’s Rule 109 Equality of mixed partials 63 (Ex. 6), 74-75 
Critical points 170 (Ex. 4), 218 
Cross products of vectors 266 (Ex. 5-7) Euclidean algorithm 376 (Ex. 6) 
Curl 267 Euler 402-404 
Curves (see also Manifolds) Evaluation of 2-forms 12-13 
defined by parameters 39 Exact differential equations 274 
defined by equations 39 Exact forms 63 (Ex. 8), 71-72 (Ex. 7-10), 273, 
320-326 
D’Alembertian 347 Exact sequences of affine maps 131 (Ex. 4) 
Decimal approximation 27 (Ex. 2), 380 (Ex. 17) Exponential function 253 (Ex. 2), 311 (Ex. 8) 
Decimal fractions 366-367, 380 (Ex. 19) Exponentials of matrices 253 (Ex. 4), 254 (Ex. 10) 
Determinants Exterior derivatives 73 
definition 101 Exterior powers of matrices (see Matrices) 
reason for name 10in 
method of evaluation 103-104 (Ex. 2) Factorial function 421-425 (Ex. 10, 11) 
Dielectric constant 241 Faraday’s Law of Induction 71 (Ex. 5), 342 
Differentiability Flows 
of k-forms 73 constant planar flows 2-4 
of maps 143 constant spatial flows 5-7 
reviewed 190 relation between direction of flow and 
of complex functions 293 corresponding (n — 1)-form 21 (Ex. 5, 6) 
main definition 390 planar flow from unit source 23 (Ex. 2) 
of maps between Banach spaces 451 spatial flow from unit source 24 (Ex. 3), 288 
Differentiable manifolds 192 (Ex. 2) 
Differential equations general discussion 328-333 
fundamental existence theorem 245-251, 320 Folium of Descartes 141 (Ex. 9), 193 (Ex. 3) 
(Ex. 5) Force (see Work) 
elementary techniques of solution 270-276 Forms 
stated in terms of differentials 272, 313-314 basic definitions 88 
describing families of curves 277 (Ex. 2) non-constant forms 142 
generalized existence theorem 315 on Banach spaces 452, 454-455 (Ex. 16-18) 
Differential forms 73 Fredholm Alternative 126 (Ex. 12) 


(see also Forms) Functions (see Algebraic functions, Elementary 


Index | 506 


functions, Maps, Transcendental functions, 
Polynomials) 
Fundamental Theorem of Algebra 306-307, 312 
(Ex. 14) 
Fundamental Theorem of Calculus 
statement 52 
uses 52-53 
proof 53-55 
related to independence of parameter 58 (Ex. 13) 
as the case k = 0 of Stokes’ Theorem 73 
in vector notation 268 
related to Poincaré’s Lemma 327 (Ex. 9) 
reviewed 391 (Ex. 4) 


Gamma function 423 (Ex. 10(n)) 

Gauss and the lattice point problem 35 (Ex. 2) 

Gauss-Seidel iteration 240 (Ex. 2, 3), 241 (Ex. 6), 
242 (Ex. 13) 

Gauss’ Theorem (see Divergence Theorem) 

Geometric mean 184 (Ex. 6) 

Goldbach conjecture 464-465 

Golden section 379n 

Grad (gradient) 267 

Gravity (see Newton’s Law of Gravity) 

Green’s Theorem in the Plane 73 


Harmonic functions 
definition 278 
as solutions of Laplace’s equation 278 
analyticity of 308 
Heat capacity 334 
Heat equation 333 
Heine-Borel Theorem 205, 393 
Holder inequality 174-175 
Homology 
simple cases 75 (Ex. 5-6) 
homology theory 320 
homology basis 323-324 
Hyperbolic functions 253 (Ex. 3) 


Image ofa map 81 
Implicit Function Theorem 
for affine maps 105-108 
for differentiable maps 133-134 
examples 134-139 
proof reduced to Elimination Theorem 157-159 
reviewed 191 
proof 232-234 
for analytic functions 307-308 
for maps between Banach spaces 451-452 
Implicit differentiation 
description of the method 144-145 
examples 145-147, 150-151 (Ex. 9-10) 
proof 156-157 
Improper integrals 401 
Independence of parameter 


for surface integrals 47 
statement 207 
proof 208-213 
general case 222 
Integers 366 
Integrability conditions 315 
Integrals (see also Improper integrals, Lebesque 
integration) 
intuitive description of meaning 24-27 
defined as a limit of sums 30-31 
difficulty of defining surface integrals 44-48 
basic properties 49-51 
double integrals as iterated integrals 51 
as area under acurve 56 (Ex. 4), 65 (Ex. 12) 
main theorem 204-213 
general properties 219-223 
as functions of S 224-225 
of complex 1-forms 294-295 
definition reviewed 400 
Integrating factors 275 
Integration by parts 222 
Intermediate Value Theorem 159n 
Inverse Function Theorem 454 (Ex. 14, 15) 
Inverse matrix 
formula for 111-112 
Isoperimetric inequality 188-190 (Ex. 19) 
Iteration 240 (Ex. 1) 


Jacobians 100 


Lagrange multipliers 161 
statement of method 169 
examples 166-169, 170-190 
restatement using differentiable manifolds 
192-193 
Laplace’s equation 278, 288 (Ex. 2) 
Laplacian 334 
Lebesgue Dominated Convergence Theorem 
437-438 
Lebesgue integration 
examples and motivation 426-431 
proof 431-435 
main theorem 436 
passing to a limit under the integral sign 
437-441 
completeness of space of Lebesgue integrable 
functions 442-446 
Leibniz notation 458-460 
Leibniz’s formula 419 (Ex. 1) 
Level surfaces 80n 
Lexicographic order for k-forms 90-91 
Light as an electromagnetic phenomenon 348 
Line integrals 265-266 
Linear maps 117-122, 123 
Lines of force 71 (Ex. 6) 
Liouville’s Theorem 311 (Ex. 9) 


Index | 507 


Logarithm function 386 (Ex. 2) 
Lorentz transformations 350 


Magnetic forces 341 
Magnetic permeability 345 
Manifolds 
affine 129 
usefulness of term 129n 
differentiable 192 
compact, oriented, differentiable surfaces 203 
compact, oriented, differentiable surfaces-with- 
boundary 214 
compact, oriented, differentiable manifolds-with- 
boundary 219 
solving differential equations 314 
Mappings (see Maps) 
Maps 
origin of term 8 
affine maps 8 
differentiable maps 143 
linear maps 117-122, 123 
Mass and energy 351-354 
Matrices 
of coefficients of an affine map 94 
products 95-97 
exterior powers 97-100 
transposes 98 
minors of 101 
formula for inverse 111-112 
Maxima and minima 160-183 
Maxwell’s equations of electrodynamics 340-348 
Mesh size (of a subdivision) 30 
Microscope 
describing meaning of differentiability 151-153 
in proof of independence of parameter 211-212 
related to integrals 222 
in rate of convergence of successive 
approximations 234 (Ex. 2) 
Minkowski’s Inequality 185 (Ex. 9), 448 
Modulo (see Arithmetic modulo n) 
Modulus of a complex number 291 


Natural numbers 357-359 

decimal notation 358 
Newton on ‘action at a distance’ 340n 
Newton’s Law of Gravity 23 (Ex. 1), 335 
Newton’s method 

Statement 242-243 

error estimates 243-244 

in computation of nth roots 262-263 
Newton’s theorems on the gravitational field of a 

spherical shell 336-337 

Norms on vector spaces 447-448 


One-to-one 8ln 
Onto as an adjective 79n 


Oriented area 6, 20 
formula for 14 (Ex. 5) 
Oriented length 19, 20 
Oriented volume 16-18, 20 
Orientations 
of planar flows 3 
of spatial flows 5 
of space (right- or left-handed) 17 
of integrals 29, 29n 
of a surface by a non-zero 2-form 45 
of the boundary of an oriented solid 66-67 
of n-space 131 (Ex. 3) 
of the boundary of an oriented manifold-with- 
boundary 219-220 
of integrals over manifolds 221-222 
Orthogonal matrices 193-194 (Ex. 5-6) 
Orthogonal trajectories 277 (Ex. 3) 


Parabola of safety 195 (Ex. 7) 
Parameterization 44-48 
Parameterization of the circle 257-258 
Parametric description of domains 38-43 
Partial derivatives 41 
definition 142 
Partition of unity 
continuous 206 
differentiable 213 (Ex. 1), 217, 220 
Pi 
definition 34 (Ex. 1) 
represented as a simple integral 65 (Ex. 14) 
computation of 256-260 
Picard’s iteration 246 
Poincaré’s Lemma 325 
Poisson’s equation 339, 341, 348 
Poisson’s integral formula 287, 308-310 
Polar coordinates 47, 64-65 (Ex. 10-11) 
differentiation of 146-147 
Polynomials 381 
Potential theory 335-340 
Power series 
Operations with 420-421 (Ex. 6-9) 
Product formula for sine 402, 415-417 
Product of forms 89 
Products (infinite) 401 
Products of matrices (see Matrices) 
Ptolemy’s trigonometric tables 379-380 (Ex. 16) 
Pullbacks 
origin of term 11 
geometrical significance 8-12, 18 
computation of 12-13 
of 3-forms 15-16 
of constant forms under affine maps 20, 86 
of non-constant forms under non-affine maps 
40-43, 142-143 
reviewed 190 
related to Leibniz notation 460 


Index | 508 


Quadratic forms 
origin of term 175n 
elliptic, hyperbolic, parabolic 177-178 
positive definite 183, 186-187 (Ex. 15-16) 


Rank 
of a system of linear equations 81 
geometrical significance 82-83 
relation to pullbacks 84 
ofalinear map 119, 124 
of an affine map 128 
of the exterior powers of a matrix 130 (Ex. 2) 
of a differentiable map (at a point not a 
singularity) 139 
reviewed 191 
Ratio test 401 
Rational numbers 359-367 
Real numbers 367-375 
Rearrangement of series 405 (Ex. 6-9), 409 
Refinement (of a subdivision) 32 
Relativity (see Special relativity) 
Relaxations 238 
Riemann integration 426-427 
Riemann’s criterion for the convergence of an 
integral 37 (Ex. 7) 


Schwarz inequality 172 
Second derivative test 187 (Ex. 18) 
Separation of variables 273 
Series (infinite) 400 
Sets 461-462, 467 
Sexagesimal fractions 379-380 (Ex. 16, 18) 
Shears 9 | 
Simple iteration 240 (Ex. 1) 
Sine, product formula for 402, 415-417 
Singularities of a differentiable map 
definition 134, 141 (Ex. 13) 
examples 134-139 
reviewed 192 
Snell’s Law of Refraction 183 (Ex. 2) 
Solution manifolds of a differential equation 314 
Space as a term not implying three dimensions 80n 
Special relativity 351-354 
Spherical coordinates 45 
differentiation of 150 (Ex. 10) 
Square roots 244 (Ex. 1) 
Stationary points 170n 
Stereographic projection 43-44 (Ex. 3), 47 
Stokes’ Theorem 
proved when domain is a rectangle 60, 67-68 
statement 72-73 
statement and proof 217-218, 220-221 
in vector notation 266-268 
Subscript notation 77n 


Successive approximations 226 
of the solution of linear equations 236 
of the inverse of a matrix 236 
of the solution of an ordinary differential equation 
246 
Surface integrals 266 
Surfaces (see also Manifolds) 
defined by parameters 41 
defined by equations 41 


Tangent line to a curve in the plane 148 (Ex. 4) 
Tangent plane to a surface in space 149 (Ex. 8) 
Taylor series 234 (Ex. 1), 300 
Tensors 23n 
Topology 321, 324 
Torus 
parameterization of 49 (Ex. 7) 
volume of 70 (Ex. 21(c)) 
as a compact, oriented surface 214 (Ex. 3) 
Transcendental functions 383-384 
Transpose of a matrix 98 
Triangle inequality 
for complex numbers 310 (Ex. 1) 
for rational numbers 365 
in Banach spaces 448 
Trigonometric functions 251 (Ex. 1), 311 (Ex. 7) 


‘Uncertainty’ of an approximating sum 32-34 
Uniform continuity 388 

Uniform convergence 449 

Uniform differentiability 389-390 


Vector fields 265 
Vector potential 347 
Vector products (see Cross products) 
Vector spaces 

informal definition 113 

origin of term 113n 

examples 113-114 

subspaces 114-115 

bases 115-117 

dimension 115-117 

linear maps 117-122 

formal definition 122-123 

glossary of terms 123-124 
Velocity vectors 147 
Volume, k-dimensional 197 
Volume of an n-dimensional ball 425 (Ex. 12) 


Wallis’ formula 405 (Ex. 5) 
Wave equation 348 
Work 
in a constant force field 1-2 
in a central force field 23 (Ex. 1) 
as the differential of potential 61, 333-340 


